27 February 2008 - 22:53Some of today’s development problems

I was reading this post about how architects should approach a project considering their team as a stake-holder, which is probably a bit misguided. I would not say that the team is a stake-holder because stake-holders are usually financially invested into the project and this investment drives attitudes around the project. The team is involved in the project, but its involvement is very different from the stakeholders.

However, the article shone light on the fact that you cannot ignore the team when embarking on a new project and tried to explore 2 different problems: the effects of team distribution and of the team skills on the architecture of the project.

Let’s start with the team skills: the article argues that when you embark on a new project you should center its architecture around the strengths of the team. Well, this is so obvious (you need to utilize the most of your team) that it doesn’t need to be re-stated. The problem that he posed is a bit unusual: the architect thinks that Ruby should be used for a project, however the team only groks Java. He goes on to say that the architect should probably use Java and try to evangelize Ruby in order for the team to learn Ruby. Well, the team will probably know some Ruby 6 to 12 months into the evangelization process, at which time a lot of development has been done in Java meaning that it is too late to contemplate using Ruby on this project. Maybe next time…

The other subject of the post was how team distribution affects architecture and the author of the post shows a bad way of architecting an application without any regard to the team distribution and composition. I would say that he is right about it, when working in distributed team it is essential to be able to implement a certain amount of autonomy within the teams so that each team can make decisions on its own and not get grid-locked on dependencies from other teams. However, setting up autonomous teams is mostly a managerial issue and not a “pure” architectural issue, as you can see the borders between architecture and management start to blur in certain cases.

It is a pretty interesting article which outlines some recent developments in today’s IT environment: polyglot development and distributed teams. These issues are managerial issues (maximizing resource utilization) which are cloaked in architectural clothes (if I can use this expression ;-)).

No Comments | Tags: Management

22 February 2008 - 3:16DTOs are not rich beans

Of all the problems of Data Transfer Objects (DTOs) the one that stands out is the anemic domain problem: DTOs are basically objects with a few variables and getters/setters for them, lacking any complex behavior. This way of developing applications has been decried times and times again so I will not waste my time exposing this problem.

However, I started realizing that maybe DTOs should not contain any complex behavior and that they should be as dumb as dirt, especially when being used for communication between multiple systems. Consider this example: we have a tax system which uses the object TaxInformation which various systems use for getting information about taxes. Let`s say that the commodity trades system, the equity trades system and the FX trades system are all using it. Let`s say that this object looks like this:

public class TaxInformation{

private Trade trade;

private TaxReceipt taxReceipt;

public TaxInformation(Trade trade){

this.trade = trade;

this.taxReceipt = new TaxReceipt();

}

public TaxReceipt getTaxReceipt(){

return taxReceipt;

}

}

As you can see a DTO dumber than dirt. Let`s create an endpoint creating and serving such an object:

public class TaxService{

public TaxInformation getTaxInformation(Trade trade){

TaxInformation taxInfo = new TaxInformation(trade);

return taxInfo;

}

}

All is fine, but a requirement comes in which says that if the trade is restricted then the tax receipt of the tax information object related to that trade should also be restricted. Here you are presented with a few choices:

1) Implement this requirement in the TaxInformation object probably in the constructor.

2) Implement this requirement in the TaxService object.

3) Implement this requirement in the TaxReceipt object, in its constructor. For the sake of argument, let`s assume that we do not have this possibility, so I turned it off.

OOP purists will immediatelly recommend choice #1, because this would turn the DTO into a rich bean and would do away with the anemic DTO. This is a mistake, because all the systems using this object for communicating with the tax system will start having class versioning errors unless they are updated with this DTO`s new class. The problem is versioning such a DTO. Putting complex behavior in a DTO raises the risks of change in the DTO, which raises the effort of propagating this change in the systems which use this DTO for communication. At one point it makes sense to drop DTOs and use some sort of protocol for communication, preferrably a protocol which deals with change pretty easily (XML documents with schemas which always get extended and with constraints which get changed very rarely are a pretty good fit).

DTOs should probably be used in systems whose lifecycles are kept in synch (i.e. they get upgraded in synch), in such systems you can propagate type changes pretty easily because you have a certain amount of control over their lifecycles (*). Expanding on this, I would say that typed languages are probably most effective locally because type changes cannot be propagated over large distances efficiently. Type-coupling (or API coupling) is a very hard coupling and its should be avoided and replaced with protocols. DTOs do create an anemic model and should be avoided if possible, but if you choose types for communication between systems it would make sense to keep these DTOs to their most basic function of passing data around or to have procedures for propagating type changes in all the systems using types for communication.
Sometimes weak DTOs are actually a good thing…

* BTW, whether a set of systems whose lifecycles are kept in synch is actually one big system with one particular lifecycle is a pretty good question. I would say that you can think about it as a big system.

No Comments | Tags: Development

6 February 2008 - 22:34Voyage, voyage

Au dessus des vieux volcans,
Glisse des ailes sous les tapis du vent,
Voyage, voyage,
Eternellement.
De nuages en marécages,
De vent d’Espagne en pluie d’équateur,
Voyage, voyage,
Vole dans les hauteurs
Au dessus des capitales,
Des idées fatales
Regardent l’océan…

Voyage, voyage
Plus loin que la nuit et le jour, (voyage voyage)
Voyage (voyage)
Dans l’espace inouï de l’amour.
Voyage, voyage
Sur l’eau sacrée d’un fleuve indien, (voyage voyage)
Voyage (voyage)
Et jamais ne revient.

Sur le Gange ou l’Amazone,
Chez les blacks, chez les sikhs, chez les jaunes,
Voyage, voyage
Dans tout le royaume.
Sur les dunes du Sahara,
Des iles Fidji au Fujiyama,
Voyage, voyage,
Ne t’arrêtes pas.
Au dessus des barbelés,
Des coeurs bombardés
Regardent l’océan.

Voyage, voyage
Plus loin que la nuit et le jour, (voyage voyage)
Voyage (voyage)
Dans l’espace inouï de l’amour.
Voyage, voyage
Sur l’eau sacrée d’un fleuve indien, (voyage voyage)
Voyage (voyage)
Et jamais ne revient.

Au dessus des capitales,
Des idées fatales
Regardent l’océan.

Voyage, voyage
Plus loin que la nuit et le jour, (voyage voyage)
Voyage (voyage)
Dans l’espace inouï de l’amour.
Voyage, voyage
Sur l’eau sacrée d’un fleuve indien, (voyage voyage)
Voyage (voyage)
Et jamais ne revient.

No Comments | Tags: Personal

1 February 2008 - 20:56Asynchronous processing and OOP

I was reading this post on infoq about how good object-oriented programming mimicks a lot asynchronous processing. It is a pretty interesting post, even though it is not very organized, the author surfs around a few concepts, finds some interesting relationships between them and then ends the article.

The post looks for similarities between implementing some piece of functionality in an asynchronous manner and implementing the same piece of functionality with methods that do not return values. It pretty much finds these similarities, but there are some important differences which we will explore now.
Let’s take a look at some code: suppose the you have this purchase order component which has the method placeOrder for placing an order:
public PurchaseOrderBean{

public placeOrder(PurchaseOrder po) throws PurchaseOrderException{

try{

// update its status in the DB

changePurchaseOrderStatus(po);

// execute payment

paymentBean.performPayment(po);

// update the inventory

inventoryBean.updateInventory(po);

// integrate with the back office

backOfficeBean.notifyBackOffice(po);

}catch(BackOfficeException ex){

// handle BackOfficeException, possibly rolling-back the current transaction

}catch(InventoryException ex){

// handle InventoryException, possibly rolling-back the current transaction

}catch(PaymentException ex){

// handle PaymentException, possibly rolling-back the current transaction

}

}

}

As you can see in this case the method placeOrder simply tells the inventory bean, the payment bean and the back-office bean to do their work.

Now let’s implement the same method in an asynchronous manner using JMS and QueueSenders:
public PurchaseOrderBean{

public placeOrder(PurchaseOrder po) throws PurchaseOrderException{

try{

// update its status in the DB

changePurchaseOrderStatus(po);

// execute payment

queueSenderToPaymentJMSQueue.send(formatMessage(po));

// update the inventory

queueSenderToInventoryJMSQueue.send(formatMessage(po));

// integrate with the back office

queueSenderToBackOfficeJMSQueue.send(formatMessage(po));

}catch(JMS exception){

// handle JMS Exception. Usually it means rolling-back the transaction and sending this exception up-stream.

}

}

}

As you can see the biggest difference between the 2 implementations is exception handling. In both cases you are telling a component (in the first case a bean, in the second case a different system) to carry out its work but in the first case you get to deal with the exceptions. One big difference between asynchronous processing and synchronous processing that I see is that in the case of asynchronous processing you are delegating the whole implementation of a particular functionality including failures while in synchronous processing you can delegate dealing with failure to the method that orchestrates the interactions between these components.
I consider this difference to be important when designing systems because it essentially splits tasks and responsabilities.
The post was right on the money on the need to try to avoid methods returning values which need to be dealt with in the calling methods, because coding in such a way encourages to write components to which you can delegate a task better. Having to deal with methods that return values means basically that you need to interpret these values outside of this method and this means that not everything was delegated to the method. Sometimes you need to code this way though…

Now let’s dig a bit into the way exceptions are handled in this 2 pieces of code. Let’s suppose that the payment fails for this particular purchase order, the credit card is expired. In the first example all processing would basically stop at this step, we would roll-back the transaction and notify this error upstream.
In the second example it is a bit more complicated, because the payment would fail in the payment system which would then have to notify the back-office, the inventory system that that particular purchase order has failed (probably the most efficient way to do this would be to publish a message about a failed payment on an ESB channel to which the other 2 systems are listening. Connecting the payment system directly to the systems it is related to is a tight-coupling and should be avoided. *).
Exception handling is done differently in the 2 implementations: the first implementation it deals with the exception right at the source while the second implements exception handling by passing messages between the systems in the back-ground. The second implementation is also more prone to tight-coupling between systems.

One cost involved in the above implementations is the cost of exception handling . In synchronous processing exception handling is done upfront and it is usually not expensive since you are handling it at the source while in asynchronous processing exception handling usually ocurs behind the scenes with systems passing each other messages, you should pay attention to it when designing systems. If you could delegate the whole processing, including failure handling to a component it would probably make sense to use asynchronous processing. If you find that some failures imply multiple systems it would probably make sense to use synchronous processing.

Another cost involved in the above example is the cost of locking down resources: in the synchronous example this cost is pretty high: you need to start a transaction, start processing, lock down database rows, etc… and roll back that transaction in the event of failure while in the second example this cost is extremely low: you start a transactions, you send some messages and you commit the transactions.
You essentially have these 2 costs and each implementation has a different cost: the asynchronous implementation has a high complexity cost (passing messages about exceptions behind the scenes) while the synchronous implementation has a high physical cost (the cost of locking down resources pending a transaction commit). These cost structure should be kept in mind when designing systems (**).

I would end this post saying that while most of the people seem to associate asynchronous processing with low-latency method calls, delayed execution and relaxed time constraints it is important to bear in mind that asynchronous processing should also be associated with the way you delegate a particular task to a particular system which is primarily a management issue. What I wanted to stress is that asynchronous messaging forces you to fully delegate processing (including exception handling) to a different component or system. It is this hand-off that I see as just as important as the other things that asynchronous is associated with: faster return from the call because you are simply passing a message rather than waiting for its execution,etc….

* One caveat though. It is a good thing to try to keep the number of the message types that are sent thru an ESB under control, otherwise you will end-up with a Spaghetti Oriented Architecture ;-).

** The present push away from ACID transactions towards systems passing messages between each other would imply that the physical cost is larger than the complexity cost. At the same time ESBs are becoming more and more popular in order to keep the cost of the relationships between various systems low.
Transactions can be viewed as a way to enforce state synchronization between multiple actors: these actors try to carry out a series of actions at the same time and if one of them fails they should all go back to original state. ACID transactions are enforcing state synchronization at the time the actions are carried out, however in some cases you can carry out this state synchronization later.
Please check these links for some unusual transaction mechanisms. Some of them (especially XTP) will very likely become de-facto standards:
Are Cross-Service Transactions A Violation of the Autonomous Tenet of Service Orientation?
Xtreme Transaction Processing under SBA terminology
Large-scale distributed transactions
Constrained Tree Schemas (CTS) and applications (CTA) for extreme OLTP (XTP)

Note: This post was written while listening to Mylène Farmer live at Bercy 2006.

P.S. This post is also quite all over the place: exception handling in async interactions vs. synchronous processing, transactions, etc… I guess Mylène Farmer can be quite distracting ;-).

No Comments | Tags: Favorites, Management