Reliable messaging is the concept that I can send you a message, and even though the channel over which I’m sending it on (let’s say for argument’s sake, the US Postal Service) is not 100% reliable (the odd letter goes missing), I can still say what I need to say to you, and you can say what you need to say to me.
How it works is that when I send you a letter, you send one back to me as soon as you get it – a receipt if you like. I might send you another 2 or 3 letters, but I’m always keeping track of the receipts you send back, and if I find I’m missing a receipt, after a while, I’ll re-send that letter. I keep a copy of all letters I’ve sent you, just in case one goes missing and I need to re-send it. And with the re-sent letter, I’ll similarly track the receipt, and even re-send again, until you get it.
This, by the way, is exactly what goes on in TCP/IP protocol, one of the key protocols of the Internet. And it works very well. We have an unreliable network (the Internet), and yet we can give some level of assurance that data gets to where it needs to go if we use the TCP/IP protocol.
The key difference between the way TCP/IP works and the way the letter exchange example works is this: in the case of TCP/IP, we have a protocol stack that consists of layers on top of other layers, each unaware of the others, and each having a distinct responsibility. TCP is a “transport” layer protocol – its job is to transport data between 2 endpoints, and conceal all the tricky ACK’s and re-transmission stuff from the upper layers of the protocol stack. It delivers a useful service to the upper protocol layers.
This all sounds good, until you realise that TCP/IP is not sufficient to guarantee message delivery between applications over the network. Why not? Lets say my application sends your application an electronic message over TCP/IP. Your TCP/IP stack gets my data, and ACK’s it, and I get your ACK and, as far as TCP is concerned its job is done. We can even close the connection. Then, the unthinkable happens and your application crashes and loses my message. I will never know you’ve lost the data, and yet I have the receipt giving me a completely false sense of security that the data got to you.
What went wrong here? Why did the seemingly robust process breakdown? To answer this we have to go back to the letter exchange example. But now, instead of us being aware that there is a protocol required to overcome the US Postal Service’s inherent unreliability, we instead use “Reliable Post Inc”, a competitor to the US Postal Service. “Reliable Post Inc” implements a re-transmission and receipting mechanism for us, to take away all that annoying copying, receipting and re-transmission stuff. Now lets say “Reliable Post Inc” arrives at your mailbox with the letter they have already receipted me for, and your dog attacks the poor postman, chews up my letter and swallows it down in one gulp, before proceeding to lift its leg on the post van. The postman drives away as fast as they can, and forgets for a moment that the game of “Reliable Post Inc” was to guarantee letter delivery. You never get the letter. I have your receipt (because “Reliable Post Inc” sent it to me when they got the letter) so I have this false sense of security that you have it. What “Reliable Post Inc” should have done is waited for me to tell them that I had read the message, and that they should send you the receipt. But this is annoying and involves me in the receipting process, and the whole idea of outsourcing delivery to “Reliable Post Inc” was so that I didn’t have to think about that.
So now we’ve brought the letter analogy back in line with TCP/IP, and what we’ve discovered is that, if we really want 100% reliability, we cannot simply outsource it to another party, because as soon as we do that, there’s this weak point where the message is handed over between us and the underlying service. True reliability is just something you and I are going to have to be aware of, and have a protocol in place to deal with.
Enter WS-ReliableMessaging. I won’t explain how it works, because its basically like TCP/IP, but at a higher layer up the WS-* (SOAP/XML based) protocol stack. Which begs the immediate question, if TCP/IP (over which most SOAP messages ultimately find themselves being transmitted) didn’t give us reliable messaging, how exactly is another layer, which does exactly the same thing going to achieve it?
Of course the answer is, it doesn’t, for the exact same reason TCP/IP doesn’t: you can’t completely outsource reliability to another party.
In terms of WS-ReliableMessaging, it can improve reliability for a certain class of message failure, but don’t let it fool you into thinking you have 100% reliability – you’re still going to have to develop your own protocol to deal with failure after the message has been receipted. This makes reliability a bona fide business concern.
WS-ReliableMessaging makes the following claims which it calls Delivery Assurances:
- 1. Messages arrive at least once
- 2. Messages arrive at most once
- 3. Messages arrive exactly once
- 4. Messages arrive in the order they were sent
Item 1 can be better resolved as a business concern, as we have seen. Item 2 can be handled at the business level by making message interactions idempotent. Item 3 is simply the intersection of 1 and 2.
Item 4 is about order, and this is interesting. Marc de Graauw explains the relationship between the order of operations and the business layer in his InfoQ article:
The first strange thing is that apparently the order is a property of messages which is important to the business layer. So if it is important to the business layer, why isn’t there a sequence number in the business message itself? We have a message, with its own business-level semantics, and the order is important: so why isn’t there some element or attribute in the message, on a business level, which indicates the order?
To backup my point that true reliability is a business as opposed to a protocol concern, I thought I’d share a transcript of an interview between Carl Franklin, Richard Campbell (from the Dot Net Rocks podcast) and their guest Jim Webber. Jim Webber is a contributor to some of the WS-* standards that come out of OASIS. This is from 29th April 2008, but still very relevant today.
Jim Webber: The reliable messaging stuff is actually relatively straightforward too. It’s a protocol where we tag message sequence, metadata into messages and recipients of messages may notice when they are missing a sequence number or two and they can ask for retransmission. The protocol itself is relatively straightforward. Look for gaps in numbers and ask those gaps to be filled in if you find they’re missing. There are some subtleties around how to build that. For example if I’m the sender of the message, I have to hold onto that message until I’m quite sure that it has been ACK’ed by the recipient because I may be asked at any point until I have been ACK’ed to retransmit. But ultimately this stuff is not really too dissimilar from the kind of stuff that goes on way down the stack in TCP.
Richard Campbell: I mean the concepts are pretty straightforward. It’s just how we’re going to recover from a message that never showed up.
Jim Webber: Absolutely. So the irony of reliable messaging is that it is not reliable messaging. It can cover up glitches.
Richard Campbell: It is recoverable messaging.
Jim Webber: It is somewhat recoverable. So it would cover the odd glitch where a message or two goes missing but the minute that a meteorite smashes through your data center no amounts of reliable messaging on the planet is going to help you recover immediately from that catastrophe.
Richard Campbell: But the outside world is going to get a clear notice that their messages didn’t get delivered.
Jim Webber: Absolutely. Although I think WS-ReliableMessaging and friends have some validity, I actually think a much more robust pattern is to make your services aware of the protocols to which they conform. I know that sounds really lofty but what I actually mean is if you write your services so that they are message centric and so that they understand that message A is followed by message B or followed by message C or message D then those services know when something has gone wrong and they can be programmed robustly to react to the failure of a message. The problem with the WS-ReliableMessaging and forgive me to my friends who are involved writing some of those specs, but the problem is they encourage some sloppy thinking on the part of the service developer again.
If you take WSDL and WS-ReliableMessaging the appealing thought is OK I’m reliable now. I don’t need to worry about the protocol that my service works with. I just can do this RPC style thing and the reliable messaging protocol will take care of any glitches, which is only true up to a point and when you get an actual programmatic failure which WS-ReliableMessaging can’t mask, it leaks at a really inopportune moment and cripples your service. Although I can actually see the utility [of WS-ReliableMessaging], when I’m building services I tend to avoid it because I want my services to know when there should be a message there for them and to take proactive action, as it is more robust that way, to chase down those messages when they don’t arrive.
You may have heard of Conways law:
“.. organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations”
Conway’s law has proven to be true for every software project I have ever been involved in. Take the client-server application where the client was developed in C++ and the server in C. The client side developers were young and hip, and into OO. The server side developers were ex-mainframe developers in their 50′s. It’s fair to say the two parties did not see eye to about much when it came to software design. The friction between the parties came to a head in the shared communication libraries for client-server communication, which they co-developed. The libraries were littered with pre-processor definitions and macros that seemed to be at war with one another. The arguments between teams carried over into the version control comments. The shared communication libraries were some of the most un-maintainable, un-readable and bug ridden code imaginable, even though the purely client-side and purely server-side code were reasonably tidy on their own.
It was around 2008 when I began my adventures into this thing we call “the enterprise”. I was going to be an “enterprise developer” and take on development of a key part of the infrastructure – a credit card transaction processing interface. I understood that enterprise development meant you had to be lean – unlike a software company selling software products or software as a service, there were no economies of scale – you only build one instance of enterprise software.
As I began to find my way round some of the custom developed services and applications, a few questions started to emerge – like what version control system do you use here? Answer: “yes we have been meaning to get to that”. Ok, so there had been only one developer on that part of the project previously, and he was a bit green, so I decided I shouldn’t pre-judge the organization based on that alone.
More questions started to appear as my predecessor walked me through the operations side of things. He showed me how they had left an older version of the credit card processing API in production because there were an number of external clients using that interface, and they could not force them to upgrade to the new interface. Fair enough. I asked about where the source code was for the old version, in case I need to go back to it should a bug need to be fixed. Answer: “… well there shouldn’t really be any bugs, because it’s been there for years now”.
It turned out that work had started on “version 2″ without any kind of version control branch or label or even so much as a simple zip backup of “version 1″ source code. They had lost the intellectual property investment of the stable “version 1″, and had re-written large chunks of it to create “version 2″, which was not backward compatible, and was considerably less stable than the previous version. Unbelievable.
“Version 2″ had been 18 months in development, and had only very recently been deployed to production. Therefore, no business value had been delivered for 18 months. Business stakeholders had lost patience, and almost lost complete confidence in the development team.
Since the recent “version 2″ update, the phone had been ringing hot, and my predecessor would have an often lengthy discussion with an upset customer who had lost sales due to downtime and bugs with the service. I was now supposed to take these calls, and be an apologist for this mess.
At this point, things were looking so bad I was seriously considering walking out the door before I was even two weeks into the job.
However, I resolved to take on the project as a challenge, and that is the only thing that kept me there. I enquired about the testing approach: unit testing, integration testing, user acceptance testing and so on. In short:
Unit Testing: “what’s that exactly?”
Integration Testing: a spreadsheet containing a single sheet with dozens columns and a hundred or so rows, each representing a test scenario. It was un-printable, un-readable, inaccurate and was the un-loved product of something the boss had instructed the developers to do. The developers didn’t feel their job was to test the product, and instead of resolving this dispute with the boss, they had yielded to his pressure, but then done the worst possible job on it, to make the point that developers can’t test! This communication breakdown, and many other examples like it had almost eroded all value from the services being developed.
User Acceptance Testing: none
As we delved into architecture there were more surprises waiting for me. Like the messaging interface that used a database table as a queue, and had a service polling the table for new records every 500ms. This, I later discovered, would occasionally deadlock itself with another transaction and become the deadlock victim, meaning someone’s credit card transaction failed. The reason for using a table as a queue: the solution architect was a relational database guy and insisted this solution be adopted when the developers had hit some problems with the message queuing technology they were using.
What is unbelievable is not that this dysfunctional situation could exist, but that project management and project stakeholders had no idea that these problems existed in the development practices and system architecture. They knew something was wrong, but lacked the ability to see any deeper inside the box. Nor did they have any notion that the industry has long since found best practices that address the very problems that were slowly destroying all value in the services they were providing.
At first I thought this organization was an anomaly, and that I would be unlikely to see anything this bad again. But then I started hearing about others who had seen similar things. And then I saw inside other organizations. I started to realize that what I’d seen was not an anomaly at all, it was practically commonplace. Sure, some were better than others, but I had yet to see inside an enterprise that had anything even remotely approaching a quality development team, with a solid set of practices that was able to deliver business value.
Conway’s law seemed to be holding true. Frictions between personalities and departments led directly to confusing and inconsistent software architectures. In fact Conway’s law can even be used in the reverse – where there exist strange inconstancies in a software architecture, you get an insight into the friction between different personalities or departments involved in its design.
If you want to assess your development team, and you aren’t a developer, just use the Joel Test. It goes like this:
- Do you use source control?
- Can you make a build in one step?
- Do you make daily builds?
- Do you have a bug database?
- Do you fix bugs before writing new code?
- Do you have an up-to-date schedule?
- Do you have a spec?
- Do programmers have quiet working conditions?
- Do you use the best tools money can buy?
- Do you have testers?
- Do new candidates write code during their interview?
- Do you do hallway usability testing?
Add one point for every “yes” answer and there’s your score. As Joel himself says:
” The truth is that most software organizations are running with a score of 2 or 3, and they need serious help, because companies like Microsoft run at 12 full-time.”
“Today’s knowledge workers, unlike the factory workers of the Industrial Revolution, own the means of production. Ultimately, knowledge workers are volunteers, since whether they return to work is completely based on their volition.” – Ron Baker
When you think about it, this statement has profound consequences for the way we think about knowledge workers. It also confines most 20th century management techniques to the history books. If knowledge workers own the means of production (their brains), then aren’t they the new capitalists? If so, I suppose computer software is the new proletariat?
We are proud to announce our new sister web site to promote mobile app development services.
Its called Mobiliser to emphasize the dynamic, fast moving nature of mobile applications.
Ocean Web continues to supply advanced web application development services, but now also offers mobile device application development for iOS, Android, Windows Phone, and more, targeting custom development for business apps.
This is a recognition of the fact that mobile apps are rising so fast in their capabilities and general acceptance in the business world that our business customers are already asking “can we have a mobile version of this web application?”, and we expect that, at the rate things are going, businesses will soon be depending heavily on mobile apps as their first point of contact with business data. I.e. there will be a reversal in the trend of developing web applications as the default way to access business data. Instead, we’ll expect mobile device apps first, and maybe a web version as an afterthought.
These are certainly interesting times.
There’s a few things about sending automated emails from an application that stump most people the first time they try it, so I thought I’d share some hard won advice.
These notes relate to setting up a Windows Server to send email, typically a web application that needs to send outbound email.
1. You will want to install your own mail server, probably on your web server. I would suggest either hMailServer or Xeams. These are fantastic, free, open source tools. The key advantage of having your own mail server installed locally is logging. These products give you the level of logging you need when your users call you and advise that they have not received a particular email notification that your application purports to have sent. A second reason to have a product like this installed locally is that they can manage automatic retries on your behalf. A third reason is that having a local mail server means your application does not block when you send an email (it completes almost instantaneously as there is no network latency).
2. You should setup an SPF record with your DNS provider for your web application’s domain. This will reduce the chance that the receiving mail server will reject your email messages as spam by showing that your web server is authorised by the domain owner to send outbound email. An SPF record is actually a TXT record which complies to a standardised format. I suggest using this online tool to help you create the SPF record correctly.
Allow me to share a discussion I started on Linked-In’s “Getting IT Right” group.
The recruiting industry has created a wonderfully self-serving “skills shortage” – perhaps in much the same way that the financial industry was able to create its own bubble.
There aren’t enough square pegs for all the square holes, yet we are sitting on mountains of round, triangular and hexagonal pegs of all colours and sizes, all desperate for a chance!
There’s talented people everywhere, but you have to be a bit open minded, and you have to be prepared to take a rough diamond and shape it yourself. It is myopic and lazy to sit back and wait for the perfectly qualified candidate to be presented to you.
There is a widespread assumption that a recognised certification in a particular discipline equates to a competency in that discipline, and if we simply find the candidates with skills X and Y they will likely be able to do the job that requires skills X and Y. How simple recruiting is, right?
I think that assumption is grossly naive, and leads to good job candidates being overlooked every day, to the detriment of the company and the nation as a whole. I would suggest there is not so much an IT skills shortage, but a chronic lack of people able to spot talent if it walked right past them.
Recruiters with no in depth knowledge of an industry (most of them) will add no value above CV keyword filtering software. How can they be expected to? (BTW, anyone who thinks IT is one industry, doesn’t understand the IT industry – IT probably has more specialised niches and roles than medicine).
Certifications are just one indicator that an individual may have a competency in a given field. Certifications typically rely on exams, and exam questions are crammed and memorised then recited. The certification then only proves that someone has a good memory and general literacy in the field, useful, but memory is a skill that is becoming less and less critical in a searchable online IT world where information in the form of facts can be found quickly and easily.
There are far more powerful indicators of competency. If you are interviewing someone who claims to have a knowledge in a particular field, and you (the interviewer) are experienced in that field, you should be able to spot a fake in 3 questions or less. If you cannot, you add no value to the process, and you should not be interviewing. It is even possible in zero questions, just get them talking about their experiences in their field, or their areas of interest in their field if they lack experience. If they know nothing about the field, let them talk about their other life experiences, how they teach themselves new things, what motivates them, why you should employ them, and so on.
You will quickly gauge their enthusiasm, the depth of their understanding, and their approach to life. Do they recite conventional, textbook answers to common problems or do they think for themselves? Can they provide multiple solutions and ideas to solving a problem, or are they trying to simply give you the answer they think you want? Ask open questions, given them no clue as to the expected answer (indeed have no expected answer) and just let them go for it.
This means that wading though CV’s just got a whole lot harder right? Perhaps, but if you can spot a fake vs someone worth talking to, then you can cut an interview down to 5 minutes or less for the unsuitable, and longer for the more suitable candidates.
Complaining about a skills shortage is like complaining about the weather – our energy might be better directed to working around it – in other words play the hand you’re dealt, search harder for talent, be open minded about what talent might look like, be prepared to help create talent, and be prepared to invest more time in the search for talent.
Perhaps the mistake many employers may make in the recruiting process is to approach the market with a desperate need to fill a hole, coupled with an arbitrary set of required attributes (pay range, certifications, experience level etc.). These attributes are often nothing more than a wish list.
When they find it hard to fill the position, they blame it on the “skills shortage”, and everyone nods in sober agreement, perhaps throwing in that the government needs to “do something about it”.
Knowledge workers are becoming increasingly specialised. In addition to a vast array of niche specialist technology roles, functional roles, vertical market specialisations, and technology product specialisations, there are vastly different levels of experience. Thats not even considering the range of personalities, and the impact that has on the suitability of a candidate to a role. Then you have to throw in that NZ is a very small market – some skillsets are held by only a handful of people in the entire country.
The result is that every knowledge worker is different, and the dated concept of one “resource” being a substitute for another, as if knowledge workers are some kind of high-tech labourer, is very 20th century.
Employers could instead take on keen apprentices and put them alongside their more experienced staff. When that is done right, there’s a kind of magic that happens when skills rub off, often at an astonishing rate. By having a range of experience levels on a team, from apprentice to senior, the employer should have options to promote from within. Then they’d have the option of hiring a new apprentice when a senior member moves on, and moving everyone in between up a notch.
I posted an answer to the following question on StackOverflow today, and I thought I’d publish it here too.
Q: “After reading many tutorials (which seems contradictory to each other) on REST based web services, I was wondering whether we can/should use SOAP to send/receive messages in REST based web services ?”
SOAP is an XML based messaging format for exchange of data. Soap also defines a means for making remote procedure calls. SOAP is an open standard from the W3C. SOAP is agnostic about the underlying transport layer. Frequently HTTP is used as a transport layer, but it can happily run over SMTP and TCP, and other transports too.
REST is an architectural style (not a standard), so be careful not to compare REST and SOAP directly because you are not comparing apples with apples. REST takes HTTP and uses it is the way it was meant to be used, with all its subtleties and richness. The REST architectural style can used to transfer data in any format – it does not mandate any particular data format. So SOAP is a perfectly good serialization format for a REST style web service. But many people use JSON, XML, plain text and many other formats with REST. You can happily exchange binary data over REST too, like image files. The nice thing is you get to choose the data format that makes most sense for your application.
Note that since REST is a pattern, not a standard, there is a lot of debate about what it means to be truely RESTful. There is a concept called the Richardson Maturity Model which lays out a series of steps towards the REST ideal. By comparing with Richardson’s model we can see exactly how RESTful a particular REST implementation is. WS-I BP web services are at Level 0 on this scale (ie. not very RESTful at all, just using HTTP as a dumb transport layer).
I would say this about choosing REST vs WSI-BP web services – it depends on your audience. If you are developing a B2B type interface within an enterprise, it is more common to see WSI-BP web services. Because there is an underlying standard, and because of the mature support by enterprise vendors (such as IBM, Oracle, SAP, Microsoft) and because of the level of framework support particularly in .NET and Java, WSI-BP makes a lot of sense when you need to get something going quickly and you want to make it easy for clients to connect in an enterprise environment, and the data being exchanged is business data that serializes nicely as SOAP.
On the other hand if you are exposing web services to the wider web audience, I would say there has been a trend away from WSI-BP and towards the RESTful style. Because REST only assumes the client supports HTTP, it can be made to interoperate with the widest possible audience. REST also gives you the scalability of the web itself, with the support for caching of resources etc which makes it will scale up to a large audience much better than WSI-BP web services.
The landing page is going to leap out at every staff member every time they open their web browser. When staff members talk about “The Intranet” they are talking less about all those documents, applications, training materials and forms, and more about the home page.
That’s why the landing page really counts. You already have the documents, applications, training materials and forms – the trouble is finding and accessing them. The landing page is like Google is to the web – a veneer over the top of all this that will make or break it. The landing page is THE page that everyone comes too. It should be the most dynamic page on the whole Intranet, updated frequently with the most important information, and nothing more.
The Intranet is a strange beast – its kind of like your own private Internet, but with lot of strange quirks, such as legacy technologies, security paranoia, and corporate politics thrown in for good measure. The key to a making good Intranet for your organization is to forget all of these things, and just make your Intranet fast, friendly and have people want to use it, not because they are required to, but simply because its the best way to find the information they need.
I wanted to share some thoughts on the subject of the corporate Intranet, so I have published a lightweight whitepaper to help organizations to focus on what really matters when building an Intranet. There’s less than 4 pages of reading, so I encourage you to have a look and leave your thoughts in a comment here.
Read it here: The Intranet Landing Page