19 September 2006 - 21:22The assembly line

A while ago I worked on a project that was laid-out as an assembly line: it had an entry point which did some work after which it sent a message out to a queue. This queue was taking the message, did some work, transformed it and send it out to another queue. And so on… At the end something was produced that was stored in a DB. It was the first time I saw this architecture and I found it interesting. It is essentially message-passing done in a sequential order, in this particular project there was no branching, no task could pass a message to multiple dependent tasks. It is obvious that the architecture could support branching, but there was no need for it in this particular project.
In this posting the term “task” will represent a task that is deployed on a node in a cluster.
I grew interested in this architecture because of the similarity it has with an assembly line. I am interested in economic concepts applied to software development and such similarities attract my attention. So, rather than talk in this post about message passing programming (you can find some good literature on the web) I’ll take a look at the effects of this architecture from a managerial point of view and I will compare this architecture to the same architecture running in a single JVM. You can set-up an interaction similar to an assembly line in one JVM by passing messages between various business logic beans. From the computer science point of view the main difference between the 2 architectures is the treatment of state. If the process that is carried out on this assembly line requires that state is kept across the whole process then a distributed assembly line is probably not the best choice. If you do go ahead with this architecture then you will have to find a way to propagate this state from task to task across a cluster.

So what would be side-effects of this architecture?
The first and most important side-effect is that the failure of a task doesn’t block the process, the rest of the items that have to be processed go on. Error reporting becomes more fine-grained, but also more difuse. You don’t have to scan a log that records every step of the process in order to get to the task’s stack trace because the node on which you have the log is dedicated to that particular task. One problem is that the error which is reported is particular to that task, you do not get the benefit of seeing the whole process dumped in a stack trace. In order to determine what went wrong you have to back-track from the current task to the previous task and in order to capture the whole information pertaining to that process instance. Trying to do this along a process spread across a cluster is not the easiest thing… In order to make this more easier you would have to propagate information considered helpful from task to task in order to report errors more meaningfully.
We would also have fine grained scalability. Once the process has been split into tasks and the communication between these tasks implemented you can scale each task independently of another. The result is that a computationally-intensive task cand be carried out over a larger part of the cluster than one that doesn’t require so many resources. The bad thing about fine grained scalability is that sometimes you need coarse-grained scalability. For example, let’s say that your organization plans to increase the usage of this process by a factor of 4, it is moving from processing 1000 items to 4000 items. You would basically have to scale the computational resources for every task 4 times and then test the new cluster. For a process with a large number of tasks it could be tedious.

Another bad side-effect is the difficulty with which you version such an architecture. If your process spreads across a whole cluster you may have to replicate the whole cluster and make sure that all the relationships between tasks are maintained in the new version. If you have a high tolerance to pain you may try to mix-and-match tasks from different versions: for example task 15 from version 4 would be related to task 16 from version 3 while waiting messaged from task 14 deployed in version 2, 4 and 5.

One good point about this architecture which is being brought up is the fact that you could reuse a task. Well, this doesn’t really apply because code-reuse is related to the specificity of a task: the more specific is a task, the less the potential for reuse.

To conclude this posting I would say that this architecture is a pretty good architecture when a process consists of a series of transformations, it decomposes the process meaningfully. Most of the side-effects are related to the length of the process, so it would help to keep the number of tasks under control. In order to better manage your process you may find that you may have to propagate additional information from task to task apart from the tasks require for carrying out their work.
If you have any other thoughts on this particular architecture or you have worked with such processes please drop a comment.

2 Comments | Tags: Development, Econo-computing, Favorites

15 September 2006 - 21:53Über architects and über developers

Sometime ago I was going thru the codebase of a project in order to become more accustomed with it when I detected a code-smell: an interface used for error handling had 2 very similar methods: one for producing a list of issues and another one for taking a list of issues and producing another list of issues from it. The methods were so similar to the point of being identical, and an inspection of the class tree derived from this interface confirmed my suppositions: the second method was basically appending the issues from the first method to the input list.

Obviously, this is a pretty bad error. The result was, among other things, a lot of copy and paste, a lot of redundant code and a lot of gotchas. This was pretty bad. I don’t know why, but the first thing that got into my mind was that this thing should be scrapped, which was obviously a very bad thing. Solely the fact that that code has gone from a half of dozen rounds of testing and has been deployed in production for a year made this code very valuable, because re-testing a new code-base implementing the same functionality would have been prohibitive. The correct way to look at it was not as at a pile of garbage, but rather as at some building blocks that could be used further. New contracts should have been developed for better alignment with the process which was getting implementing and the existing code blocks should have been adapted to these new contracts.

I was once listening to a JEE expert talking about a failed JEE project. He was talking with the lead architect and pointed out a flaw in their architecture. The architect decided to neglect the flaw and go ahead because time was running out. The end-result, according to the JEE expert, was that 12 months later the work of 40 developers was scrapped because of that flaw. I refuse to believe this. I cannot believe that the work done by those developers for one year could not be salvaged. I don’t know. I have seen projects going the way of the do-do bird but at the end various pieces could be salaged from the mess they were in. For example, a catalog module of an e-commerce application that never went live could be abstracted and plugged into another application (say, an application managing photo portofolios). Sometimes the application is way less than the sum of its parts…

Regarding the team in which you are working: odds are pretty good that you will not get a Rod Jonson to architect your application and that your team will be composed from pretty ordinary developers. Well, they are the people with whom you will develop your applications. Chances are they will make mistakes, chances are you will make mistakes, what is important is to deliver what you have to deliver. You have to find value in what has been developed and exploit that value as much as possible. (Looking at what has been developed from a functional perspective helps). Chances are whatever you are working on will change later and become obsolete. You have to deal with this change, and waking up one day to declare a whole code-base obsolete is not the right way to go about it.

No Comments | Tags: Development