How do you measure the success of an Agile transformation?

Wednesday, June 17, 2015
posted by daveb

If I could choose just one measure (and it’s probably a good idea to have just one measure) I’d choose Cycle Time – time (in days) to turn a request or requirement into delivered business value (i.e. in production).

This is a very objective measure that is hard to kid yourself on, it’s easy to measure, and it has direct meaning to all stakeholders.

A Brief History of Agile

Monday, June 8, 2015
posted by daveb

Dozens of great books have been written around product and software development with lean/agile and related concepts over the last 40 years or so. While Agile is still young and evolving rapidly (bear in mind the term Agile was only coined in 2001) Lean and related concepts like the Theory of Constraints pre-date that term by decades. The Agile Manifesto was not the invention of lightweight process, it was just a milestone in its evolution towards mainstream adoption in software development. Very seldom have I seen an online resource even mention this fact – I guess the internet believes history started in around 2000!

A small, random scattering of books to get some history on Agile and some insight into its origins:

The Mythical Man Month (Brooks) 
The Goal (Goldratt) 
Extreme Programming Explained (Beck)
Lean Software Development (Poppendieck)
Peopleware (deMarco)
The Toyota Way
The Lean Startup

Hopefully some of these will lead you to deeper insight than online resources and get you past the hyperbole of Agile.

What is the biggest weakness of Agile?

Monday, May 4, 2015
posted by daveb

Agile, as in The Agile Manifesto ( is a very elegant way to express the learnings of many software experts over many years. The Agile Manifesto is a kind of declaration by leading software experts that there is another way to think about software development other than classic waterfall based, documentation / process heavy methodologies that tend to see people only as resources. While these experts don’t attempt to put forward a single recommended methodology, they do agree on the principles of the Manifesto, and indeed they coined the term “Agile”.

The Agile Manifesto hints at what Agile is by comparing the Agile way to the traditional way – “Individuals and interactions over processes and tools” for example. If you haven’t heard of the Agile Manifesto, chances are you’ve picked up some highly distorted definition of Agile, such as “Agile means no documentation” or “Agile means Scrum” or “Agile means iterations”. The Agile Manifesto says nothing of the sort.

So to respond to the question “What’s the biggest weakness with Agile?” we have to first be clear about what we mean by Agile. Ideally, we’d use the definition from The Agile Manifesto, but unfortunately the term Agile has become very overloaded, and has been mis-applied and widely misunderstood. The real weakness with “Agile” (in the sense its commonly used today) is the very word itself! Its almost unusable due to its many interpretations, connotations and so on.

For example, if you are trying to convince your organisation that it could benefit from Agile, its probably safer to avoid the term altogether than to risk those that have a definition of Agile in mind misunderstanding you completely.

Scrum in a BAU environment?

Monday, March 16, 2015
posted by daveb

Allow me to share a question, and my response to it on Linked-In’s Agile Coaching group.


Do you have any ideas or suggestions on how to improve the team focus while the Scrum (Ops) team works on several changes that sometimes don’t have a relationship?

Context: A mainframe environment of >6500 COBOL programmes where several highly complex regulations are gathered in one system.


We have a very similar problem – one Product Backlog, one Scrum Team but the work that some team members do has little to no impact on what other team members do, to the point where they have independent test/release cycles. There’s probably 3-4 separate sub-teams, sometimes with overlapping resources, working on separate products that occasionally interact.

In Scrum terms, Scrum of Scrums is the advice on how to scale Scrum. This may be of benefit to you if you have say > 20 people involved, but for us with 9, it seems like too much ceremony.

I think there is perhaps an absence of good advice from Scrum around how to run Scrum with a small team & a diverse range of products. Scrum can still work, but there’s more waste in the form of exposure to communications that have no bearing on your work.

I would question whether Scrum is the right fit. Scrum has a sweet spot that is around complex (single) product development efforts with team(s) of 3-9 members, where much of the work is in-sourced, and the team is co-located.

It may also be worth looking at Kanban which has its sweet spot in product maintenance and support, because, chances are, if you have a small team and a diverse product portfolio to manage, you are in BAU mode.

Doing everything at once

Monday, March 2, 2015
posted by daveb

What happens when Requirements Analysis, Design, Development and Testing all happen at once, on the same features?

Common sense software project management says that  Requirements Analysis, Design, Development and Testing are distinct Software Development Life Cycle (SDLC) phases that happen one at a time, each being dependent on the step before it. This is what we commonly refer to nowadays as Waterfall (before Agile I believe we simply called this Project Management).

Agile & Lean based methodologies point out that we can view a software product as a group of features, order them according to business value, then start work on Requirements Analysis for the most high value items first. We then pass those over to Design & Development while we are do Requirements Analysis for the next most high value feature, and so on, creating a pipeline with different features at different stages of delivery (a value stream).

But that’s not exactly what I’m talking about here – I’m talking about doing Requirements, Design, Development and Testing for a single feature at the same time.

Common sense also says that doing all of these things at once is not only a bad idea, but simply not possible. For instance, how can Testers test software that has not been Developed yet? (This is a common assumption that is incorrect, because testers have a lot of value to add to a team than simply testing).

Therefore, I was surprised to find that on a project I am currently working on the team had self-organised to do all four activities at the same time, for the same feature. And you know what? It was the most productive we have been in months!

So how did this work? Admittedly, Requirements Analysis had a head start on the other activities. Design commenced with the Requirements in an incomplete state – they were more or less correct, and covered most of the ground necessary to describe the the business need, but completely lacked the depth of analysis necessary for the Design, Development & Testing activities. The Designer had no choice but to start by clarifying the Requirements i.e. the Designer was asking a lot of pointy questions of the business to make sure they had thought through what they appeared to be asking for, and looking to flesh out some shallow requirements with greater depth.

Meanwhile, the Developers had been able to make a start on the infrastructure parts of the solution, and they were able to start building and refining the solution as new information and clarity came from the Designer’s work. The Design & Development team, and in fact all team members worked very closely together. Most communication was verbal.

At the same time the Tester was able to start working with the Designer and the Business Analyst in charge of the Requirements to start gaining an understanding of how the system was supposed to work, and what he would need in order to test it. He was able to start doing productive work well before the Development Team were anywhere near completion, and was able to question the Requirements and add clarity there.

The result: so far this appears to be working extremely well, although I won’t declare it a success until the project is delivered. The key improvement over the staggered waterfall approach is the quality and intensity of communication within the team. Everyone is talking about the requirements, the design, and how to test it, all the time. It’s a real buzz to work in an environment of rapid-fire discussion, thought and communication and it helps bring out the best everyone has to offer. This is how a team would work if it had an impossibly short time frame to deliver, and their lives depended on getting it done!

The irony of this is that we are supposed to be working in a Waterfall project management environment – but that was working so badly that the team members basically self-organised and did whatever gave us the best chance of success with a deadline rapidly approaching, and all eyes on our team.

In order to make this “Do everything at once” approach work, you need the following:

  • Co-located, small (3-9 member), cross-functional team.
  • Safe communication environment – anyone can say what they need to when they need to. There is respect for all opinions, no danger of a boss overhearing and suggesting we are not doing things the “right way”.
  • Professional, transparent, capable team members in every discipline.
  • You need a coordinator role on the team. I suggest the Designer fulfills this role since they are in the best position to bridge all the disciplines. This role needs to communicate the constant stream of changes to all the team members.
  • No formal change control processes around the Requirements & Design activities.
  • No blame. Instead of blaming the business for its poor requirements or the lack of design documentation, dive in and make them better.

Would I recommend this approach? No, not as general advice, but in the right circumstances, it is definitely worth a try. At the very least it’s worth a try if you are falling behind schedule, everyone is blaming everyone else, no one is talking etc. – what have you got to lose?

On the Corruption of Agile

Monday, February 9, 2015
posted by daveb

While Scrum can claim great success in its adoption rate on product development and software projects, I would put forward that many of these Scrum projects, particularly the software projects, have not been anywhere near as successful as anyone had hoped. I put forward, based on my own observations, that many attempts to install Scrum (especially on enterprise software development projects) have resulted in a gradual reversion back to pre-Scrum practices. These Scrum teams, if they are honest with themselves, may declare Scrum only a partial success at best. (If they are not, they will declare Scrum a success anyway!).

Agile practices, on the other hand, as originally outlined in the Agile Manifesto, and before that in Kent Beck’s work on Extreme Programming (refer Beck’s book eXtreme Programming Explained) have not been anywhere near as successful in their reach into mainstream enterprise culture as has the practice of Scrum.

Frequently we see Scrum adopted on software projects without co-adoption of Agile practices (continuous integration, automated testing, refactoring, etc.) This is unfortunate because an iterative project management practice (Scrum) imposed upon a software project, without the adoption of suitable software practices is a recipe for disaster.

Scrum teams that have not developed the Agile practices (and technical skills) that enable their software to be continuously modified will quickly find out that their code base collapses under its own weight, as iterative change piled on top of change decays the quality of the code, and gradually makes future change more and more difficult. Martin Fowler warned us about this is his blog post titled Flaccid Scrum.

However, the Scrum “brand” continues to be strong, perhaps because of the success of Scrum in product development, and areas outside of software development where the lack of Agile technical practices is less relevant. The Agile “brand” (perhaps because of its direct connection with software development) on the other hand has taken the hit, with Agile almost a dirty word in some corporate environments.

This is the true tragedy. I put forward that it is much better to have the Agile practices adopted by your software team than it is to have achieved a Scrum implementation devoid of the necessary Agile technical practices. Scrum without Agile practices is a Pirrhic victory.

Turning the pyramid upside down

Thursday, November 6, 2014
posted by daveb

pyramidMuch has been written about managing change in large organisations. However, examples of successful change programmes are still far too rare. Even when change programmes succeed they take far too long, and cost far too much. Large organisations simply find it very hard to change the way they do things internally. This is because real change is impossible without the full buy-in of least a majority of the staff members affected. People simply don’t like change – there’s often much that’s threatening about the kind of change that gets pushed from the top down, and little upside. Put yourself in the position of an administrator that has become expert in a system over 10 years of daily usage. Now try telling that person that the system will be thrown out in favour of a new system. That is of course a very threatening situation to someone whose feeling of worth at the organisation is based on their indispensable knowledge of the old system.

Leading change in hearts and minds is surely the foundation of successful organisational change, because at the end of the day changing an organisation’s operations involves changing the way people do things. The key is leadership, in its classical sense. Unfortunately its just this kind of leadership that large businesses are frequently very bad at. How many examples of great leaders can you identify around you in your workplace? How many individuals can you identify that you would follow if your life depended on their leadership? A true leader has the respect of their team, and gives respect back. They don’t need to force their will on their team because their team trusts them and feels their point of view is heard. You should consider yourself fortunate if you can identify just one such person in your work environment, and sadly you would be extremely fortunate to work in a team with such a person in the lead role. It shouldn’t be like this of course, but that’s the reality of corporate culture.

But in stark contrast to this, “IT leadership” in large organisations typically decides on an IT strategy and starts pushing changes downstream onto the organisation. Business users typically get involved in the requirements gathering process, and “IT leadership” calls this “listening to the users”. But this is like me asking you: “What colour would you like your new car?” before I’ve understood your transport needs. Furthermore, I’m a car enthusiast, and I love the prospect of putting you in a new car (I know I’d want one, therefore you must too). Perhaps what you really needed was an annual train pass, but I jumped right over that because I have no interest in trains, let alone public transport.

We in IT leadership really need to engage with the business much earlier, and far more deeply – so deeply that we really understand what’s going on at the front lines of the organisation, where the rubber hits the road, as it were. But this is hard work. I’d argue that in any reasonably complex business environment it is not possible for analysts to understand everything that is going on at the business level, unless less they have been at the organisation for at least 3-4 years. The best analysts have been in multiple roles, including operational roles, so that have have a deep understanding of a business area in terms of operations, support, technology, and business process. These analysts are very rare. If you have analysts like these look after them! They are irreplaceable. Most analysts develop shallow approximations to what’s going on, and while these approximations can be useful, it’s very dangerous to view these approximations as if they are the complete picture. If IT leadership develops solutions based on these shallow approximations, they won’t build the right solution, at least not on the first attempt.

In the 1990’s IT got away with top-down driven change. E.g. we decided the company needed a PC on every desk, so we made it happen. But in 2014, things are no where near so simple. Our biggest IT projects aren’t often about providing this kind of basic infrastructure anymore – we already have that. What we need are new ways to do business, new ways to solve old problems, ways to get head and shoulders above the competition, ways to enable us to move faster. IT persists in thinking, like it did in the 1990’s, that it just needs to ask the business what the issues are, then come up with a snazzy technical solution.

However, there’s another way to approach organisation change, an approach that fosters trust and gets real buy-in, an approach that is borne of the Scrum, Agile and Lean movements. It is to turn around the notion that we in IT or organisational leadership actually even understand what changes would have the most benefit (even after asking the business), or whether change is required at all.

Instead we should eat some serious humble pie and try this: Ask the folks in the front lines themselves what needs to change, and critically, let them become the designers and leaders of change. Lead from behind by supporting them to get the changes they want by providing whatever they need – software, developers, analysts, trainers etc. but always work under the front line leaders (in Scrum, you’d make them the Product Owner). Turn the organisation chart upside down and respect and empower the front lines for that is where your organisation creates value.

Critical to this approach is the concept of iterations – the buildmeasurelearn feedback loop. This means we change something (preferably something small), measure its effect, and learn from that what to try next. Repeating this loop is the key. It actually doesn’t even matter what change you make – the measurement step will tell you whether it was a positive or negative change, and that will drive your next iteration. The iterations need to be as short as possible for this to be effective. One week iterations work well.

Of course it takes time for front line workers to get into this way of thinking (and longer for management!) after accepting decades of top down driven changes that have left them disillusioned and change-weary. But once they get the idea that they can have whatever changes they want, they’ll start flexing their new found powers creatively and working out how to make best use of it. There is nothing as empowering and engaging as being in control of one’s own success. And with that new power should come new responsibility – the front lines, not their managers should be responsible for their department’s results, because, after all, they are the ones creating those results, not their manager nor their manager’s manager.

The main drawback of the front-line driven approach I am arguing for is that big changes don’t happen that way. Finance systems don’t get replaced that way. Front-line staff are great at coming up with small improvements, but they may not be in the best position to see opportunities for big step changes.

The big changes may still need to come from IT leadership. But importantly, if IT staff are constantly working with front-line staff to make the small changes they need, then IT gains an enormous amount of understanding of the business during this process, and a huge amount of mutual trust is created if this is executed well. After 12 months of close work with the business users, IT staff are in a much better position to recommend larger “step” changes in the solution space.

Further reading about these concepts:

  1. Insecure Manager’s Don’t Want Your Suggestions
  2. Implementing Lean Software Development
  3. Peopleware: Productive Projects and Teams
  4. Extreme Programming Explained

The Problem Space

Monday, June 30, 2014
posted by daveb


Guess which requirement was written by a solution-space thinker, and which was written by a problem-space thinker:

Requirement A: As the Iceberg Manager I need to receive timely (same day) notice when an iceberg breaks free of the ice shelf so that a decision can be made on whether or not to track it.

Requirement B: As the Iceberg Manager I need the system to provide an online form for the Iceberg Scientists to fill in when an iceberg breaks free of the ice shelf.

The trouble is, requirements like B seem to riddle project/product backlogs. It is solution-space thinking, and its poor work if done by a professional Business Analyst. (As an aside, if your Business Analyst or Product Owner writes requirements like B above, you need to tell them, or at least tell the Project Manager, that the whole team is being excluded from the creative process by such “requirements”.) Its leads us to a solution, short circuiting the potential to explore the problem from the many creative angles that a typical software development team is capable of. It fails to describe the nuances of the problem at all.

On a project team, this effectively shuts down all creative thinking, except that which comes from the Business Analyst. Software Developers, Testers, Technical Writers and other technical people are intelligent and creative people – that’s how that got where they are. It’s a terrible shame to exclude them from the creative process, and the quality of work, not to mention the morale, of the team will suffer greatly. Business Analysts and Product Owners are creative intelligent thinkers too, but we really need them to take on the role of defining the problem as clearly as possible. There’s no reason they can’t be part of the solution design, but their first job is to define the problem-space clearly. And when they move into the solution-space, they should expect their voice to be one of many, and that it should carry no particular authority over the team.

Hence we need the Business Analyst or Product Owner more than ever to think in the problem-space. If you don’t have a dedicated Business Analyst or Product Owner on your team, you have to be especially careful to avoid the traps of premature solution-space thinking, which frequently happens when a team of solutionists hear about a problem that needs solving. Being solutionists, we like our solution to be the one that gets implemented. This means we have to be quick to voice our solution as early as possible in order to get the buy-in of the other team members. This is wrong-headed in itself and is a sign that a team is a group of competitive, self-interested individuals rather a team at all. But worst of all, it drives the team to bypass the stage of really thinking about, and analyzing, the problem-space properly. And when we do that, we risk it all. We risk solving the wrong problem, or completely missing simple solutions, in favor of those that first come to mind, or those that are the easier to convince the team to agree to.

I’ve been involved in plenty of projects where I’ve worked directly with the customer / entrepreneur / stakeholders, without a Business Analyst in the middle, so this is nothing new. However, I have come to value the input a good Business Analyst brings to the project more and more over the years.

There’s a point made clearly in the Lean software development process that goes something like this: “The largest source of waste in software projects is building the wrong thing”.

Its kind of obvious when you think about it – should your team spend 3 months building a solution that does not solve the business need, and does not end up getting used, you’ve wasted 3 months effort. Its as if you achieved nothing at all in 3 months. You would have been better to send the whole project team away for a 3 month (paid) holiday. At least then they would have come back fresh and full of new energy. But I digress.

Getting back to the Business Analyst (or the equivalent person in the problem-space mindset), the value they add is having a mindset purely focused on the problem-space. We in technology are so deeply involved in the solution-space that we find it hard to work in the problem-space in an open minded way. When technologists, project manager’s, and others embedded in the solution-space attempt to write requirements that define the problem-space, we do a rotten job because we usually have a solution in mind as we are writing. We then reverse-engineer the requirements from the solution such that the reader will be led towards our solution. Exploring the problem-space fully is essential if we are to be lean, because being lean means building the right thing, and building the minimal thing that will solve the problem. Defined that way, we’d be well advised to be very critical of the definition of the problem in the first place. One poorly chosen word might lead us towards unnecessary complexity.

So where have we got to with all this? We solutionists should take all requirements with a grain of salt. Requirements need to be criticized, questioned and tested (even if only though thought experiments), because it is essential to get the definition of the problem as accurate as possible, and free from mis-interpretation. This precludes us from jumping into solution-mode too early. We need to take on the responsibility to shake down the requirements, and we should be among the harshest critics of the requirements. Only when requirements survive this test, and the scrutiny of the entire team and stakeholders should solution-space thinking begin.




WS-(un)ReliableMessaging and the US Postal Service

Sunday, November 24, 2013
posted by daveb
Reliable messaging - sorry but it's not

Reliable messaging – sorry but it’s not

Reliable messaging is the concept that I can send you a message, and even though the channel over which I’m sending it on (let’s say for argument’s sake, the US Postal Service) is not 100% reliable (the odd letter goes missing), I can still say what I need to say to you, and you can say what you need to say to me.

How it works is that when I send you a letter, you send one back to me as soon as you get it – a receipt if you like. I might send you another 2 or 3 letters, but I’m always keeping track of the receipts you send back, and if I find I’m missing a receipt, after a while, I’ll re-send that letter. I keep a copy of all letters I’ve sent you, just in case one goes missing and I need to re-send it. And with the re-sent letter, I’ll similarly track the receipt, and even re-send again, until you get it.

This, by the way, is exactly what goes on in TCP/IP protocol, one of the key protocols of the Internet. And it works very well. We have an unreliable network (the Internet), and yet we can give some level of assurance that data gets to where it needs to go if we use the TCP/IP protocol.

The key difference between the way TCP/IP works and the way the letter exchange example works is this: in the case of TCP/IP, we have a protocol stack that consists of layers on top of other layers, each unaware of the others, and each having a distinct responsibility. TCP is a “transport” layer protocol – its job is to transport data between 2 endpoints, and conceal all the tricky ACK’s and re-transmission stuff from the upper layers of the protocol stack. It delivers a useful service to the upper protocol layers.

This all sounds good, until you realise that TCP/IP is not sufficient to guarantee message delivery between applications over the network. Why not? Lets say my application sends your application an electronic message over TCP/IP. Your TCP/IP stack gets my data, and ACK’s it, and I get your ACK and, as far as TCP is concerned its job is done. We can even close the connection. Then, the unthinkable happens and your application crashes and loses my message. I will never know you’ve lost the data, and yet I have the receipt giving me a completely false sense of security that the data got to you.

What went wrong here? Why did the seemingly robust process breakdown? To answer this we have to go back to the letter exchange example. But now, instead of us being aware that there is a protocol required to overcome the US Postal Service’s inherent unreliability, we instead use “Reliable Post Inc”, a competitor to the US Postal Service. “Reliable Post Inc” implements a re-transmission and receipting mechanism for us, to take away all that annoying copying, receipting and re-transmission stuff. Now lets say “Reliable Post Inc” arrives at your mailbox with the letter, delivers it, then issues me the receipt. But in an unfortunate accident your letterbox is burnt to the ground in a Guy Fawkes prank before you could get your mail. You never get the letter. I have your receipt (because “Reliable Post Inc” sent it to me once they had dropped the letter into your box) so I have this false sense of security that you have it. What “Reliable Post Inc” should have done is waited for you to tell them that you had received and read the message. Only then should they send me the receipt. But this is annoying and involves you in the receipting process, and the whole idea of outsourcing delivery to “Reliable Post Inc” was so that we didn’t have to think about that.

So now we’ve brought the letter analogy back in line with TCP/IP, and what we’ve discovered is that, if we really want 100% reliability, we cannot simply outsource it to another party, because as soon as we do that, there’s this weak point where the message is handed over between us and the underlying service. True reliability is just something you and I are going to have to be aware of, and have a protocol in place to deal with.

Enter WS-ReliableMessaging. I won’t explain how it works, because its basically like TCP/IP, but at a higher layer up the WS-* (SOAP/XML based) protocol stack. Which begs the immediate question, if TCP/IP (over which most SOAP messages ultimately find themselves being transmitted) didn’t give us reliable messaging, how exactly is another layer, which does exactly the same thing going to achieve it?

Of course the answer is, it doesn’t, for the exact same reason TCP/IP doesn’t: you can’t completely outsource reliability to another party.

In terms of WS-ReliableMessaging, it can improve reliability for a certain class of message failure, but don’t let it fool you into thinking you have 100% reliability – you’re still going to have to develop your own protocol to deal with failure after the message has been receipted. This makes reliability a bona fide business concern.

WS-ReliableMessaging makes the following claims which it calls Delivery Assurances:

  1. 1. Messages arrive at least once
  2. 2. Messages arrive at most once
  3. 3. Messages arrive exactly once
  4. 4. Messages arrive in the order they were sent

Item 1 can be better resolved as a business concern, as we have seen. Item 2 can be handled at the business level by making message interactions idempotent. Item 3 is simply the intersection of 1 and 2.

Item 4 is about order, and this is interesting. Marc de Graauw explains the relationship between the order of operations and the business layer in his InfoQ article:

The first strange thing is that apparently the order is a property of messages which is important to the business layer. So if it is important to the business layer, why isn’t there a sequence number in the business message itself? We have a message, with its own business-level semantics, and the order is important: so why isn’t there some element or attribute in the message, on a business level, which indicates the order?

– source:

To backup my point that true reliability is a business as opposed to a protocol concern, I thought I’d share a transcript of an interview between Carl Franklin, Richard Campbell (from the Dot Net Rocks podcast) and their guest Jim Webber. Jim Webber is a contributor to some of the WS-* standards that come out of OASIS. This is from 29th April 2008, but still very relevant today.

Jim Webber: The reliable messaging stuff is actually relatively straightforward too. It’s a protocol where we tag message sequence, metadata into messages and recipients of messages may notice when they are missing a sequence number or two and they can ask for retransmission. The protocol itself is relatively straightforward. Look for gaps in numbers and ask those gaps to be filled in if you find they’re missing. There are some subtleties around how to build that. For example if I’m the sender of the message, I have to hold onto that message until I’m quite sure that it has been ACK’ed by the recipient because I may be asked at any point until I have been ACK’ed to retransmit. But ultimately this stuff is not really too dissimilar from the kind of stuff that goes on way down the stack in TCP.

Richard Campbell: I mean the concepts are pretty straightforward. It’s just how we’re going to recover from a message that never showed up.

Jim Webber: Absolutely. So the irony of reliable messaging is that it is not reliable messaging. It can cover up glitches.

Richard Campbell: It is recoverable messaging.

Jim Webber: It is somewhat recoverable. So it would cover the odd glitch where a message or two goes missing but the minute that a meteorite smashes through your data center no amounts of reliable messaging on the planet is going to help you recover immediately from that catastrophe.

Richard Campbell: But the outside world is going to get a clear notice that their messages didn’t get delivered.

Jim Webber: Absolutely. Although I think WS-ReliableMessaging and friends have some validity, I actually think a much more robust pattern is to make your services aware of the protocols to which they conform. I know that sounds really lofty but what I actually mean is if you write your services so that they are message centric and so that they understand that message A is followed by message B or followed by message C or message D then those services know when something has gone wrong and they can be programmed robustly to react to the failure of a message. The problem with the WS-ReliableMessaging and forgive me to my friends who are involved writing some of those specs, but the problem is they encourage some sloppy thinking on the part of the service developer again.

If you take WSDL and WS-ReliableMessaging the appealing thought is OK I’m reliable now. I don’t need to worry about the protocol that my service works with. I just can do this RPC style thing and the reliable messaging protocol will take care of any glitches, which is only true up to a point and when you get an actual programmatic failure which WS-ReliableMessaging can’t mask, it leaks at a really inopportune moment and cripples your service. Although I can actually see the utility [of WS-ReliableMessaging], when I’m building services I tend to avoid it because I want my services to know when there should be a message there for them and to take proactive action, as it is more robust that way, to chase down those messages when they don’t arrive.

The full transcript of the interview is available, along with the audio from Dot Net Rocks.


Nobody needs reliable messaging


WS-ReliableMessaging Wikipedia article

WS-ReliableMessaging OASIS Standard


Enterprise IT is broken (part 1)

Monday, November 18, 2013
posted by daveb

Broken windows in a St Petersburg abandoned cinema


You may have heard of Conways law:

“.. organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations”

Conway’s law has proven to be true for every software project I have ever been involved in. Take the client-server application where the client was developed in C++ and the server in C. The client side developers were young and hip, and into OO. The server side developers were ex-mainframe developers in their 50’s. It’s fair to say the two parties did not see eye to about much when it came to software design. The friction between the parties came to a head in the shared communication libraries for client-server communication, which they co-developed. The libraries were littered with pre-processor definitions and macros that seemed to be at war with one another. The arguments between teams carried over into the version control comments. The shared communication libraries were some of the most un-maintainable, un-readable and bug ridden code imaginable, even though the purely client-side and purely server-side code were reasonably tidy on their own.

It was around 2008 when I began my adventures into this thing we call “the enterprise”. I was going to be an “enterprise developer” and take on development of a key part of the infrastructure – a credit card transaction processing interface. I understood that enterprise development meant you had to be lean – unlike a software company selling software products or software as a service, there were no economies of scale – you only build one instance of enterprise software.

As I began to find my way round some of the custom developed services and applications, a few questions started to emerge – like what version control system do you use here? Answer: “yes we have been meaning to get to that”. Ok, so there had been only one developer on that part of the project previously, and he was a bit green, so I decided I shouldn’t pre-judge the organization based on that alone.

More questions started to appear as my predecessor walked me through the operations side of things. He showed me how they had left an older version of the credit card processing API in production because there were an number of external clients using that interface, and they could not force them to upgrade to the new interface. Fair enough. I asked about where the source code was for the old version, in case I need to go back to it should a bug need to be fixed. Answer: “… well there shouldn’t really be any bugs, because it’s been there for years now”.

It turned out that work had started on “version 2″ without any kind of version control branch or label or even so much as a simple zip backup of “version 1″ source code. They had lost the intellectual property investment of the stable “version 1″, and had re-written large chunks of it to create “version 2″, which was not backward compatible, and was considerably less stable than the previous version. Unbelievable.

“Version 2″ had been 18 months in development, and had only very recently been deployed to production. Therefore, no business value had been delivered for 18 months. Business stakeholders had lost patience, and almost lost complete confidence in the development team.

Since the recent “version 2″ update, the phone had been ringing hot, and my predecessor would have an often lengthy discussion with an upset customer who had lost sales due to downtime and bugs with the service. I was now supposed to take these calls, and be an apologist for this mess.

At this point, things were looking so bad I was seriously considering walking out the door before I was even two weeks into the job.

However, I resolved to take on the project as a challenge, and that is the only thing that kept me there. I enquired about the testing approach: unit testing, integration testing, user acceptance testing and so on. In short:

Unit Testing: “what’s that exactly?”

Integration Testing: a spreadsheet containing a single sheet with dozens columns and a hundred or so rows, each representing a test scenario. It was un-printable, un-readable, inaccurate and was the un-loved product of something the boss had instructed the developers to do. The developers didn’t feel their job was to test the product, and instead of resolving this dispute with the boss, they had yielded to his pressure, but then done the worst possible job on it, to make the point that developers can’t test! This communication breakdown, and many other examples like it had almost eroded all value from the services being developed.

User Acceptance Testing: none

As we delved into architecture there were more surprises waiting for me. Like the messaging interface that used a database table as a queue, and had a service polling the table for new records every 500ms. This, I later discovered, would occasionally deadlock itself with another transaction and become the deadlock victim, meaning someone’s credit card transaction failed. The reason for using a table as a queue: the solution architect was a relational database guy and insisted this solution be adopted when the developers had hit some problems with the message queuing technology they were using.

Turns out there were more surprises in store

What is unbelievable is not that this dysfunctional situation could exist, but that project management and project stakeholders had no idea that these problems existed in the development practices and system architecture. They knew something was wrong, but lacked the ability to see any deeper inside the box. Nor did they have any notion that the industry has long since found best practices that address the very problems that were slowly destroying all value in the services they were providing.

At first I thought this organization was an anomaly, and that I would be unlikely to see anything this bad again. But then I started hearing about others who had seen similar things. And then I saw inside other organizations. I started to realize that what I’d seen was not an anomaly at all, it was practically commonplace. Sure, some were better than others, but I had yet to see inside an enterprise that had anything even remotely approaching a quality development team, with a solid set of practices that was able to deliver business value.

Conway’s law seemed to be holding true. Frictions between personalities and departments led directly to confusing and inconsistent software architectures. In fact Conway’s law can even be used in the reverse – where there exist strange inconstancies in a software architecture, you get an insight into the friction between different personalities or departments involved in its design.

If you want to assess your development team, and you aren’t a developer, just use the Joel Test. It goes like this:

  1. Do you use source control?
  2. Can you make a build in one step?
  3. Do you make daily builds?
  4. Do you have a bug database?
  5. Do you fix bugs before writing new code?
  6. Do you have an up-to-date schedule?
  7. Do you have a spec?
  8. Do programmers have quiet working conditions?
  9. Do you use the best tools money can buy?
  10. Do you have testers?
  11. Do new candidates write code during their interview?
  12. Do you do hallway usability testing?

Add one point for every “yes” answer and there’s your score. As Joel himself says:

” The truth is that most software organizations are running with a score of 2 or 3, and they need serious help, because companies like Microsoft run at 12 full-time.”