3 November 2008 - 14:24Task allocation and designing interactions

As I was saying in my previous post a lot of technological issues (such as the debate of REST vs. WS-*) are actually organizational issues and that at the bottom of them you will find that the process of communication has been broken down into tasks (namely the task of  the payload-to-processor mapping) and those tasks have been allocated (they have been externalized or internalized) according to the transaction costs of each particular environment.

I would follow this by saying that a lot of time will be saved if the interactions between various parties will be broken down into tasks and that the tasks will be allocated according to the nature of the organization(s) within which these interactions will be carried out. This applies not only message passing but to any other environment in which various entities engage in collaboration. 
For example, you may find out that DSLs will not work in particular setting not because DSLs are bad, but because the organization within which these DSLs are implemented and used simply doesn’t warrant their success.

No Comments | Tags: Favorites, Management

16 October 2008 - 19:31REST and WS - part 5

One of the best presentations that I have seen lately is Mark Little talking about REST and WS at QCon London 2008, or rather about the differences between the uniform interfaces and the specific interface. Mark Little gets right at the bottom of the differences between REST and WS, and these differences are not all differences between architectural styles, but rather differences in organizations.

One of the best interpretations of the differences in REST vs WS that I have seen is the one that Mark makes in this presentation and that I will try to reproduce below is the division of tasks of mapping a message sent to an entity (could be a service, could be an URL) and the component/processor within that entity that will process it: when you send a message to a specific interface you will be specifying the method and the component that will process that message, so the message-to-processor mapping is done by the client. Whereas when you send a message to a specific interface you need to attach some meta-data to that message which will be used by the entity implementing the specific interface for mapping that message to its appropriate processor, therefore the message-to-processor mapping is done by the entity.

The message-to-processor mapping is externalized in the case of specific interfaces while the uniform interface internalizes it. These different mappings come with different transaction costs: in the case of the external mapping the costs are synchronizing the client with changes in the specific interface while in the case of the internal mapping the costs are adapting the uniform interface’s mapper to changes in the meta-data (a new type of messages should be mapped to a new processor).
Now, if we apply economics to the task of message-to-processor mapping we would start to get some interesting results. According to Coase’s theorem you can externalize an activity if the transaction costs for that carrying that activity are low enough and if the transaction costs are high then that activity is better internalized. It would follow that when the transaction costs of this mapping are low you can externalize it via a specific interface, otherwise you would be better off to internalize it via an uniform interface. 
The transaction costs of the specific interface are the synchronization of distributed clients and this could be resolved either thru versioning, or by not modifying the specific interface or by standards. If you can keep these costs down it follows that you can delegate the message-to-processor mapping efficiently to the client else you are better off with a uniform interface. Keeping these costs down also implies that the number of methods that get exposed is kept down in order to avoid the number of changes (more methods, more opportunities for changing them). Generally speaking interaction via specific interfaces is best limited to a few coarse-grained components, between a few participants and with components which do not change very often.
The above synchronization costs run counter to scale in number of clients: as the number of clients starts to grow you are starting to have problems synchronizing all of them and eventually the transaction costs of using the specific interface will become so high, it will not make sense to use it. One corollary of this is that the uniform interface is far better at the scale of the internet and this is probably the main reason why it is used at this scale.
The above will provide a pretty good guide to an architect trying to figure out whether to use a specific interface or an uniform one because their use is not driven by technology, but rather by the organization(s) within which they are employed.

Another interesting point in his presentation was the effort for creating a REST-ful standard for carrying out distributed transactions and the need for agreement between multiple parties. As I was saying in a previous post about REST and WS the differences between these 2 camps are more of organizational nature than anything else. Mark touched upon the fact that even if the need for carrying out distributed transactions via REST exists there is still no standard for doing so, that the REST community seems pretty much happy with implementing low-level infrastructure and not with higher-level, and higher-value, components. I find this a pity and I have the impression that this need will be fulfilled only it raises to a level at which major corporations will find profitable for getting involved. The REST community appears simply way too fragmented to be able to carry out this effort on its own…

Anyway, go ahead and watch the presentation, it is the most thoughtful presentation on distributed computing that I have seen in a while.

No Comments | Tags: Econo-computing, Favorites, Management

10 October 2008 - 15:47ESB transformations

ESBs is an essential enterprise computing capability which allows for mediating connections between various parties. Typical EBS usage consists of message producers sending messages to an ESB which transforms them and then forwards them to message consumers.

It is at the transformation stage that the thorniest issues arise with ESBs. As outlined in this presentation from Thoughworks there is a tendency to push business logic inside the ESB resulting in a very tight coupling between applications and the ESB (part of the business logic will be stored in applications, part of it in the ESB and the business logic in the ESB will need to be kept in sync with the business logic in the applications). Not a nice scenario and unfortunately the solution provided in the presentation (apparently you only need to use Guerilla SOA) doesn’t say much. The business logic which gets shoved into the ESB in the above example are the relationships between the various parties which are sending each other messages and the mess that comes out of it as these relationships change over time.

One example of ESB usage is a customer-facing web application which takes purchase orders from customers and sends messages to an inventory and shipping system, to the accounts receivables system and to the customer loyalty system. There are a few ways that this messages can be sent: 1) the web-application sends only one message to the ESB which then creates 3 more messages, one for the inventory system, one for the accounting system and one for the customer loyalty system (in which case the relationships between the web-app and the 3 systems are managed within the ESB); 2) the web-application creates 3 messages which are all sent independently to the 3 systems (in which case the relationship between the web-application and these 3 systems is managed by the web-app typically thru dedicated senders which are transforming the message according to the other party’s specs) or 3) the web-app publishes the message on the ESB which forwards it as-is to the receiving systems which are receiving the message as it has been created by the web-app and transform it according to their rules (in which case the receiving systems are managing the relationships with the web-application typically thru dedicated receiving points which are transforming the incoming messages).
So far I have only seen 2 or 3 used and it was used successfully. I have not seen 1 being used successfully, but rather I have seen ESBs simply delegated to the role of a dumb-pipe while the processing is done on the end-points. I think that the main reason for which the ESB cannot grow beyond the role of a dumb-pipe is that the transformations typically involve data, imagine that in the above scenario the IDs of the products on the web-application are different from the ones in the inventory system (a very common case BTW). In order to service this transformation you would need an up-to-date mapping between all relationships that it is managing. Sooner or later that ESB will turn into a giant vacuum cleaner that will need to suck every byte of data available in order to service the relationships stored in it.
Lesson #1: Managing relationships typically involves managing data.

Let’s go more into the gory details and think about how we would deal with stale data: let’s say that the datastore holding the mappings of product IDs from the webapp to the inventory systems is missing a product ID (another very common case). In the case where the relationships are maintained at the connection end-points (message producers or consumers) the operation team managing those end-points would have access to the exception (missing product ID mapping in this case) as well as to its solutions (create this mapping in your mapping store). How will the ESB deal with stale data? Well, the ESB will have to find the entity which can resolve this exception, package the exception and send it to that entity. Sooner or later the ESB gets becomes a spaghetti bowl.
Lesson #2: ESB has bad exception management capabilities. 

Pushing the transformation towards the end-points effectively couples the end-points because these endpoints must share the format (*) of the messages they exchange and, just as importantly, they need to share data. Sharing data is usually done by one party having a procedure to export its data and the other parties using this procedure for creating local replicas of that data and synch-ing them according to a calendar. In addition to that the parties replicating the data have procedure for overriding the data they receive and adding to it. As far as I have seen it works pretty well and when it doesn\t work it is more of a management issue on the side issuing the data.

To sum this up, I would keep any data enriching out of the ESB and use the ESB trasformations only for format transformations and version transformations (**). One more benefit to keeping the ESB down to routing and format transformations would be that its role is well-contained and it doesn’t turn into a wild-card (does this particular problem gets handled by the ESB, how, why, when, etc…) making your IT assets more manageable. This results in a neater division of concerns in your overall enterprise architecture.

* It is advised that the format of these messages is not using APIs but rather text-based protocols such as XML, pipe-delimited, etc… API couplings is one of the most horrendous things that I have seen, synch-ing libraries on multiple endpoints at the same time is an incredibly time-consuming activity.

** Version-based transformation are transformations of the same message between 2 different versions of it: let’s say that one message producer decides to publish a new version of the messages that it has been produced so far, the ESB should be made aware of this and deliver the new version to the consumers that are registered to receive it as well as create a new message with the old version out of the original message and send this new message (formatted according to the old version) to the clients which are still registered to receive the old version. Part of this versioning effort is the transformation from the current version to an older version and this transformation would be best done by the message producer which will load a transformer into the ESB for this transformation. The ESB will then be re-configured to apply this transformation to the original message with the new version and the resulting message to be routed on its old route. This way the versioning effort will be pushed inside the ESB and messages will be delivered transparently to the intended receipients and recipients will not have to be upgraded when their counterparties upgrade their message format effectively de-coupling their release cycles.

No Comments | Tags: Management

23 September 2008 - 19:34OOP inheritance

Typically OOP inheritance is used for sharing various common methods and a typical example would be something like this:
You have interface Vehicle which has sub-classes Bicycle and MotorVehicleMotorVehicle in turn has the sub-classes Car and Truck, each with its specifics. Car, Truck and Bicycle are all sub-classes of Vehicle and as such are all inheriting the same common behavior, such as some object which specifies the speed limits within which each Vehicle should operate (for example a Car should be able to handle higher speeds than a Truck or a Bicycle).
Fair enough, and this way to use OOP inheritance has its reasons for use.

However, one other reason for using inheritance that I see is for creating substitutes in order to partition code bases. In this case you are using sub-classes not because you need some shared behavior to be encapsulated in a super-class, but because you have distributed teams working on the same codebase and you need to minimize collisions between them in the codebase (a collision would be 2 or more persons working on the same file and is usually resolved by a merge in CVS). Solving conflicts thru CVS merge is a pretty error-prone process and doesn`t scale out well, so at one point it is necessary to partition development so that different teams can work on the same project without elbowing each other.
Sub-classing in this case would solve this problem by letting a developer substitute its team’s type for another team’s type.

One example is using helper classes which service a POJO and which are used by more than one team:
Let’s say that we have a TaxProcessor class that uses a TaxCalculator for calculating the taxes to be applied to a trade. Let’s say that you have 2 teams, one working on taxing bond trades and one working on taxing equity trades and that they are all using the same TaxCalculator class on the TaxProcessorTaxCalculator has the method taxTrade which delegates to taxBondTrade and taxEquityTrade. As time goes on a lot of fixes are put into taxTrade in order to deal with various corner-cases to the point where 1) TaxCalculator becomes hard to maintain and 2) more than one developer is working on the same method at the same time. A solution would be to sub-class TaxCalculator into BondTaxCalculator and EquityTaxCalculator, have each implement the method taxTrade according to its specifics and then have TaxProcessor use the appropriate class according to the trade type.

As you can see from above good software models imply good separation of concerns. Good separation of concerns translates in good separation between development teams working in parallel on the same project. Good separation between development teams results in low transaction costs or interaction costs between teams and into a more efficient way to delegate work and in a development environment which scales better.

No Comments | Tags: Development, Management

12 August 2008 - 18:38Relaxing ACID properties as you scale out

A while ago I looked at the transition currently occuring in enterprise computing from ACID transactions to compensation by message passing. I concluded by saying that relaxing ACID properties and achieving state alignment via message passing is essentially shifting the costs from the costs of ACID transactions (such as database rows locking) to complexity costs (such as managing the growing number of messages between systems which are compensating for these relaxed transactions). Another side effect is the introduction of latency in this state alignment process: thru ACID transactions state alignment is achieved at the time the operation takes place while thru message passing it occurs some time later.

I was watching this interview on infoq and I found that Gregor Hohpe pretty much voiced the same opinions, that as you scale out it is impossible to achieve state alignment at the time the operation and that you need to compensate for it by passing messages.These complexity costs are essentially the costs of managing business logic as it spawns across multiple systems, applications, departments, etc… State alignment in these cases consists primarily of having various business analysts agree on how these systems should interact (should this trade go to this depot for that security for that client) and then implementing the interaction between these systems. One thing that is not very obvious is that the cost of interaction between systems is pretty close to the cost of interaction between the departments running these systems and that as you scale out you need to be able to accomodate or change various departments you are encountering (*).

It is interesting to watch the interview from this point of view because you get to see the costly hacks that various organizations got to make in order to get their systems to talk to one another (the horrific “magic account” used in one bank makes one hair stand up). Gregor goes as far as saying that these ways to deal with these problems are pretty common, so common actually that at one point he gets defensive about the sloppiness which comes thru these examples. The fact that these problems are wide-spread is a pretty bad excuse, ideally these systems would be architected so that we do not need to add one layer of hacks on top of another in order to get something of value out of them.

At the end of this interview I ended up with the impression that we are on the first iteration of relaxing the ACID properties and that I got a pretty good glimpse into the costs associated with this relaxation. Sometime in the future patterns for relaxing these properties will appear and problems common in all these compensations will be solved. I think that these patterns will be more oriented towards business-logic and management rather than towards technology. Efficient interactions revolve around small transaction costs and minimizing transaction costs will be part of the solution of managing this growing complexity.

* Interestingly enough an SOA strategy will probably take these costs down if it manages to impose procedures so that various departments interact one with another more efficiently. However, I am pretty skeptical that we will see a lot of progress soon on this front.

No Comments | Tags: Management

9 April 2008 - 21:21Insufficient knowledge as moral hazard

I find that insufficient knowledge about a system or a package poses a moral hazard because it encourages people interacting with that system or package to take all sorts of short-cuts in order to get something done and do so thinking that they know what the effects of their actions are.

Knowing how a package implements some business requirement may make a developer pick up a method from that package and use it because it fits its requirements. Unchecked this may lead to spaghetti-code, pretty much everybody calling methods from all over the place in order to get their job done quickly.
Using restrictive access attribute (such as private methods) is not really a work-around, if someone thinks they need a private method to be made public because they “know” the package they will usually make it public without any second thoughts.

The right approach to interacting in a new way with a package or system is to delegate that interaction to that package, i.e. to code this interaction in the package itself if possible. As I was arguing in a previous post knowledge about a package is one of the transaction costs of interacting with that package. Transaction costs usually delimitate who does what because resources will typically cluster together into larger entities in order to bring these transaction costs down. Transaction costs, when handled correctly, will prevent methods whose usage requires extensive knowledge (and which carry large transaction costs) to be called from all over the application. Methods that get called from all over the application are typically methods about which little knowledge is required, helper methods being a good example.

So I would advise people to either make sure that whoever is using their packages either understands them well or very little, having people in the middle that think they “know” the package may result in incorrect usage of their packages.
I would also advise people to refrain from spreading knowledge about their packages indiscriminately and rather target the recipients of this knowledge carefully.

No Comments | Tags: Management

27 February 2008 - 22:53Some of today’s development problems

I was reading this post about how architects should approach a project considering their team as a stake-holder, which is probably a bit misguided. I would not say that the team is a stake-holder because stake-holders are usually financially invested into the project and this investment drives attitudes around the project. The team is involved in the project, but its involvement is very different from the stakeholders.

However, the article shone light on the fact that you cannot ignore the team when embarking on a new project and tried to explore 2 different problems: the effects of team distribution and of the team skills on the architecture of the project.

Let’s start with the team skills: the article argues that when you embark on a new project you should center its architecture around the strengths of the team. Well, this is so obvious (you need to utilize the most of your team) that it doesn’t need to be re-stated. The problem that he posed is a bit unusual: the architect thinks that Ruby should be used for a project, however the team only groks Java. He goes on to say that the architect should probably use Java and try to evangelize Ruby in order for the team to learn Ruby. Well, the team will probably know some Ruby 6 to 12 months into the evangelization process, at which time a lot of development has been done in Java meaning that it is too late to contemplate using Ruby on this project. Maybe next time…

The other subject of the post was how team distribution affects architecture and the author of the post shows a bad way of architecting an application without any regard to the team distribution and composition. I would say that he is right about it, when working in distributed team it is essential to be able to implement a certain amount of autonomy within the teams so that each team can make decisions on its own and not get grid-locked on dependencies from other teams. However, setting up autonomous teams is mostly a managerial issue and not a “pure” architectural issue, as you can see the borders between architecture and management start to blur in certain cases.

It is a pretty interesting article which outlines some recent developments in today’s IT environment: polyglot development and distributed teams. These issues are managerial issues (maximizing resource utilization) which are cloaked in architectural clothes (if I can use this expression ;-)).

No Comments | Tags: Management

1 February 2008 - 20:56Asynchronous processing and OOP

I was reading this post on infoq about how good object-oriented programming mimicks a lot asynchronous processing. It is a pretty interesting post, even though it is not very organized, the author surfs around a few concepts, finds some interesting relationships between them and then ends the article.

The post looks for similarities between implementing some piece of functionality in an asynchronous manner and implementing the same piece of functionality with methods that do not return values. It pretty much finds these similarities, but there are some important differences which we will explore now.
Let’s take a look at some code: suppose the you have this purchase order component which has the method placeOrder for placing an order:
public PurchaseOrderBean{

public placeOrder(PurchaseOrder po) throws PurchaseOrderException{

try{

// update its status in the DB

changePurchaseOrderStatus(po);

// execute payment

paymentBean.performPayment(po);

// update the inventory

inventoryBean.updateInventory(po);

// integrate with the back office

backOfficeBean.notifyBackOffice(po);

}catch(BackOfficeException ex){

// handle BackOfficeException, possibly rolling-back the current transaction

}catch(InventoryException ex){

// handle InventoryException, possibly rolling-back the current transaction

}catch(PaymentException ex){

// handle PaymentException, possibly rolling-back the current transaction

}

}

}

As you can see in this case the method placeOrder simply tells the inventory bean, the payment bean and the back-office bean to do their work.

Now let’s implement the same method in an asynchronous manner using JMS and QueueSenders:
public PurchaseOrderBean{

public placeOrder(PurchaseOrder po) throws PurchaseOrderException{

try{

// update its status in the DB

changePurchaseOrderStatus(po);

// execute payment

queueSenderToPaymentJMSQueue.send(formatMessage(po));

// update the inventory

queueSenderToInventoryJMSQueue.send(formatMessage(po));

// integrate with the back office

queueSenderToBackOfficeJMSQueue.send(formatMessage(po));

}catch(JMS exception){

// handle JMS Exception. Usually it means rolling-back the transaction and sending this exception up-stream.

}

}

}

As you can see the biggest difference between the 2 implementations is exception handling. In both cases you are telling a component (in the first case a bean, in the second case a different system) to carry out its work but in the first case you get to deal with the exceptions. One big difference between asynchronous processing and synchronous processing that I see is that in the case of asynchronous processing you are delegating the whole implementation of a particular functionality including failures while in synchronous processing you can delegate dealing with failure to the method that orchestrates the interactions between these components.
I consider this difference to be important when designing systems because it essentially splits tasks and responsabilities.
The post was right on the money on the need to try to avoid methods returning values which need to be dealt with in the calling methods, because coding in such a way encourages to write components to which you can delegate a task better. Having to deal with methods that return values means basically that you need to interpret these values outside of this method and this means that not everything was delegated to the method. Sometimes you need to code this way though…

Now let’s dig a bit into the way exceptions are handled in this 2 pieces of code. Let’s suppose that the payment fails for this particular purchase order, the credit card is expired. In the first example all processing would basically stop at this step, we would roll-back the transaction and notify this error upstream.
In the second example it is a bit more complicated, because the payment would fail in the payment system which would then have to notify the back-office, the inventory system that that particular purchase order has failed (probably the most efficient way to do this would be to publish a message about a failed payment on an ESB channel to which the other 2 systems are listening. Connecting the payment system directly to the systems it is related to is a tight-coupling and should be avoided. *).
Exception handling is done differently in the 2 implementations: the first implementation it deals with the exception right at the source while the second implements exception handling by passing messages between the systems in the back-ground. The second implementation is also more prone to tight-coupling between systems.

One cost involved in the above implementations is the cost of exception handling . In synchronous processing exception handling is done upfront and it is usually not expensive since you are handling it at the source while in asynchronous processing exception handling usually ocurs behind the scenes with systems passing each other messages, you should pay attention to it when designing systems. If you could delegate the whole processing, including failure handling to a component it would probably make sense to use asynchronous processing. If you find that some failures imply multiple systems it would probably make sense to use synchronous processing.

Another cost involved in the above example is the cost of locking down resources: in the synchronous example this cost is pretty high: you need to start a transaction, start processing, lock down database rows, etc… and roll back that transaction in the event of failure while in the second example this cost is extremely low: you start a transactions, you send some messages and you commit the transactions.
You essentially have these 2 costs and each implementation has a different cost: the asynchronous implementation has a high complexity cost (passing messages about exceptions behind the scenes) while the synchronous implementation has a high physical cost (the cost of locking down resources pending a transaction commit). These cost structure should be kept in mind when designing systems (**).

I would end this post saying that while most of the people seem to associate asynchronous processing with low-latency method calls, delayed execution and relaxed time constraints it is important to bear in mind that asynchronous processing should also be associated with the way you delegate a particular task to a particular system which is primarily a management issue. What I wanted to stress is that asynchronous messaging forces you to fully delegate processing (including exception handling) to a different component or system. It is this hand-off that I see as just as important as the other things that asynchronous is associated with: faster return from the call because you are simply passing a message rather than waiting for its execution,etc….

* One caveat though. It is a good thing to try to keep the number of the message types that are sent thru an ESB under control, otherwise you will end-up with a Spaghetti Oriented Architecture ;-).

** The present push away from ACID transactions towards systems passing messages between each other would imply that the physical cost is larger than the complexity cost. At the same time ESBs are becoming more and more popular in order to keep the cost of the relationships between various systems low.
Transactions can be viewed as a way to enforce state synchronization between multiple actors: these actors try to carry out a series of actions at the same time and if one of them fails they should all go back to original state. ACID transactions are enforcing state synchronization at the time the actions are carried out, however in some cases you can carry out this state synchronization later.
Please check these links for some unusual transaction mechanisms. Some of them (especially XTP) will very likely become de-facto standards:
Are Cross-Service Transactions A Violation of the Autonomous Tenet of Service Orientation?
Xtreme Transaction Processing under SBA terminology
Large-scale distributed transactions
Constrained Tree Schemas (CTS) and applications (CTA) for extreme OLTP (XTP)

Note: This post was written while listening to Mylène Farmer live at Bercy 2006.

P.S. This post is also quite all over the place: exception handling in async interactions vs. synchronous processing, transactions, etc… I guess Mylène Farmer can be quite distracting ;-).

No Comments | Tags: Favorites, Management

23 January 2008 - 16:08Hiring

I was reading the Prefer Design Skills post on Martin Fowler’s bliki and I agree with most of its conclusions. He’s spot on on the need to prefer someone with design skills vs. someone with skills about a particular language.
Design skills carry a lower risk of becoming obsolete than the knowledge of a programming language, programming languages come and go, good design usually stays. So it would follow that when you bring design skills in house you bring in skills which depreciate slowly and this is beneficial.

Design skills are largely the outcome of some soft skills (good communication with the customer, easy understanding of a domain, maybe knowledge of a few domains, etc…) that are much needed by any developer. When you bring design skills in-house you also bring those soft skills in. These soft skills will translate in a more frictionless environment, better communication with the client, faster ramp-up on new projects for which the person with design skills has domain knowledge, etc…

Hiring a person can be viewed as an investment: you are investing the salary for the probation period, the relationships that this person will make (these relationships may included relationships with your customer).

Hiring a person comes with opportunity costs: you will probably have to refuse the same position to some other one.The opportunity costs of a hire are the skills of the person(s) which has/have been refused the position and the contribution that the refuse would make in the future, it is important to keep these costs down as well.

The opportunity costs of choosing a person with design skills vs one with language skills is largely the time it takes the design-skills person (I am getting tired of typing this denominator) to learn the language. I would trade this opportunity cost vs. the opportunity cost of hiring someone with no design skills because I feel so (sorry, I don’t have time to go into details).

Reading the above you will see a huge bias towards design skills vs. language skills in this post to the point of where you are probably wondering when should you hire a person with strong language skills (or who knows, maybe if you should hire such persons at all).
Staying in the same controlled thought-experiment environment I would add that positive bias towards a particular set of people tends to rise those people’s salaries, they become more expensive as the demand for them increases faster than their supply. It would follow that at one point the premium associated with design skills would become so large as to dwarf the benefits from employing people with design skills. At that point it would become economically sensible to hire language-skills people.
Another point would be that you could lower the opportunity cost of refusing a person with design skills by hiring a person with language skills which is motivated to sharpen his/her design skills. If you come across such a person the opportunity cost of hiring his/her would be the contribution made by the person with design skills while the person with language skills is still in the ramp-up period of sharpening his/her design skills. I would say that this opportunity cost would be more than offset by the relationship that you will develop with the person with strong language skills while mentoring, tutoring his/her.

And this is the last time you will read design skills in this post, I feel like I said it 500 times ;-).

P.S. I guess it shows that I love economics ;-).

No Comments | Tags: Management

21 January 2008 - 21:36Communication problems in dynamic languages

I was reading this post on manageability.org which is hi-liting some of the problems with large projects in dynamic languages. I am not familiar with these problems, never having to code in a dynamic language on a large-scale project, but I am currently working on a large scale project in Java with many distributed teams and in such an environment it is crucial that the code created by different teams is put together fast and reliably. Putting the code from various sources together fast is necessary for efficient collaboration between teams.

Probably the number one problem of distributed teams working on the same project is to make sure that the code doesn’t contain references to methods which don’t exist or which have changed. Statically-typed languages solve this problem by their own definition at compile time. Dynamically-typed languages probably have some solutions to this problem (I heard some people documenting the Ruby code that they write), but the biggest drawback of them is that this type of solutions are human-powered and not automated, an automated solution would be something similar to a compiler inspecting your code for invalid method calls. I would say that the solving code-conflicts that arise in distributed teams are more expensive in dynamic languages than in static languages, therefore when choosing a dynamic language you should probably stay away from coding in a distributed development environment (*).

One problem that I see with Ruby and all those elegant languages is that they are not very expressive to a large audience. They encourage brevity, but I find that they encourage it at the expense of readability and this narrows their audience (this is why I find it unlikely that Ruby will break out of a small elite). A developer working on an application needs to understand pretty fast what a particular method does rather than marvel at the “beauty” of that particular piece of code. I find that the information stored in Ruby code is harder to extract than the same information stored in Java code. It is “beautiful” (**), but also inefficient and it is more expensive to work with it.

I would say that the biggest challenge using dynamic languages revolves around collaboration and communication. Statically-typed languages put a smaller cost on communicating thru code, dynamic languages can achieve the same thru some external mechanism. While dynamic languages have more features, this freedom comes at the expense of collaboration. Evidence would seem to make this supposition true, most development in dynamic languages happens in small teams concentrated locally where the needs for communication are small.

* All the above being said I remember Martin Fowler talking about agile projects coded by distributed teams which did very well. He didn’t mention what language these teams are using, but given the fact that ThoughtWorks seems to favor Ruby I would assume that these distributed teams were using Ruby as well. It would be interesting to see the steps taken by them in order to be able to handle these communication problems.

** I personally find it brain-dead to talk about the “beauty” of code because this is not the purpose for which code is written. A statue should be beautiful because this is its purpose, to be enjoyed by who watches it, a piece of code’s purpose is to work as it is supposed to.

Later Edit: The communication and collaboration costs of dynamic languages will define the population of developers that use them and their usage within that population. I would not be too surprised is most of Ruby and Groovy usage will be restricted to small scripts which do not require a lot of collaboration while a small niche of very productive teams will be able to use them on a large scale.

2 Comments | Tags: Econo-computing, Management