27 October 2009 - 13:13Non-blocking flows

Recently I was working on a business flow to which we had to add a new requirement: grouping a particular type of transactions under a file. The file had to be unique per day, it had to be created on the fly when the transaction batch starts getting processed and the transactions had to be assigned to it at the end of processing. The first solution that one could think of is to change the flow to check if the file exists (and if no then we would create it) and after this check we would assign the transactions to that trade file.

However, doing only this would pose a concurrency problem, namely that two or more transactions batches arrive at the same time when no trade file has been created yet. If each transaction batch would check if the trade file exists concurrently and try to create it, again concurrently, we could end up with duplicate trade files. One way to avoid duplicate trade files is to detect if a trade file needs to get created, allow one of the transaction batches to create the file while blocking the other transaction batches till the trade file gets created. We looked at the costs of blocking and as the costs looked pretty small (we would be blocking only once time per day when the file gets created) we decided to go ahead with blocking.

However, this approach clearly doesn’t scale, and we implemented it because it the conditions for blocking happen very rarely (as I was saying once per day) and it would not be feasible in the case of a higher amount of contention. We looked at some non-blocking alternatives and it looks like a good one would be to allow the transactions to check if trade file exists and if not then to create the trade file on the fly (without blocking) and at the end of transaction processing send further a message saying that there is a risk that some data is inconsistent (namely that there is the risk that some files have duplicates and transactions are assigned to duplicate files) and establish a procedure for repairing the transactions (if necessary). This would allow for non-blocking flows and higher thru-put, but it would come at the expense of a period of time in which data is inconsistent (in our case there is the risk that some transactions will be assigned to duplicate trade files till the duplicate trade files get fixed).

If inconsistent data is OK for the business and the rest of the application (it could be that these repair procedures as well as inconsistent data affect other parts of the application) and if blocking flows are creating significant performance problems then allowing for data to be inconsistent for a certain period of time while providing a mechanism for detection and repair of inconsistencies would probably solve the problem.

Another solution to this problem would be to detect messages which may cause blocking and create a new stage in the flow which deals with such messages.

No Comments | Tags: Development, Favorites

27 May 2009 - 14:23DSLs and APIs

In a previous post I was saying that some technological issues are simply proxies for organizational issues and that whether one techonology will thrive or fail within an organization is not determined exclusively on the technology itself but it could be influenced by the organization itself. I ended the post saying:
For example, you may find out that DSLs will not work in particular setting not because DSLs are bad, but because the organization within which these DSLs are implemented and used simply doesn’t warrant their success.

I will get into an example where using DSLs is probably not a good fit because of organizational issues rather than inner strengths and weaknesses. Let’s say that you have some distributed teams working on the same application and that these distributed teams are led by a team of architects which need to create an environment in which these distributed teams work together. I find it pretty hard to see how this environment could be constructed using DSLs, particularly iteratively-developed DSLs, because changes in the DSL would have to be propagated across multiple teams in dispersed locations which would need to be re-trained for every iteration. Compare this approach with creating an API that would be released every so often and for which re-training costs are very small.

I have the impression that the main cost of DSLs is the distance between the place where the DSL is produced and the place where the DSL is consumed and that DSLs tend to be either specific to the location where they are produced or widely known and used within a community (in the second case the distance between the DSL producer and the DSL consumer is compensated by “mind-share”). I would say that the main difference between a DSL and an API is that DSLs are more tightly bound to the domain (by definition) and that part of the costs of using a DSL is the cost of transferring the knowledge about that domain between various stake-holders. This would mean that domain experts can adapt to a DSL targetting their domain faster than a typical developer that is new to the domain and that domain experts would be more productive. The difference in productivity between various members of the work-force means that the work-force composition should partly drive the decision to choose a DSL or APIs on a particular project. Regarding the distance between DSL production and DSL consumption this paper discussing how distance affects knowledge transfers is probably a good read.

Other issues wih DSLs such as developer lock-in (a developer working with a DSL can become out of synch with the rest of the IT landscape), the lack of language design skills on the IT job market, the problem of capturing domain knowledge correctly in large or rapidly changing domains are not explored in this post.

I end this post hurriedly…

Later Edit: I was watching this interview with Charles Simonyi and I would say I remained with these impressions:
If the DSL comes from an already existing language, such as domain experts notations, then there are chances that the DSL will be a good language (this addresses the issue of lack of language design skills on the IT job market). The problem of creating a new language is done away with by re-using a currently used language (the domain experts notations) and creating a new way to implement this existing language.
Regarding domains which are changing rapidly Charles Simonyi says that by separating the notation from the actual domain schema it becomes pretty easy to change the notation (the DSL) without affecting the underlying domain schema. Creating a DSL becomes an iterative process.
End-user programming is to remain a pipe-dream because of the difference in skills between what is needed for programming and the difference in skills needed for working within the domain. DSLs seem to engage the domain expert more into the process of software creation rather than turn domain experts into programmers. Domain experts contributions to software creation thru DSLs may be simply to communicate the domain better to a programmer. Language work benches would create a new division of tasks according to skills.

LLE: I watched this presentation by Intentional Software on domain work-benches and I remained with these impressions:
In an environment where domain workbenches are used it looks like the programmer will be involved in creating projections to an operational environment. The developer will focus on the systems while the  domain experts will focus on the business logic. This division of tasks addresses the issues arising from developers’ lack of enthusiasm about the domain.
It looks like domain work-benches should be used in large and rapidly changing domains where the transfer of knowledge from domain experts to developers is very costly due to the size of the domain and the speed at which it moves (this answers one of my questions above).

I think that the argument that I was making above that DSLs will be location-specific still applies even with a domain workbench in the picture because knowledge transfers with high costs will not appear in the creation of the DSL itself (the DSL being just one notation for an underlying domain schema, one notation among many, it can be created and used locally) but rather in the creation of the domain schema itself (which may end up being a collaborative effort carried out among people in various locations).

As I was saying above I have the impression that in the near future domain workbenches and DSLs will be used where the costs of knowledge transfers from domain experts to developers is large. If the costs of creating languages start to drop and these domain work-benches become ubiquitous then it is possible that a complete separation between domain logic (stored in domain code and expressed thru various notations) and its implementation in a system (which appears to be essentially a projection of the domain code into an operational environment) will appear. It would be interesting to work with Intentional Software’s domain workbench too.

We will have to wait and see…

L3E: This is another presentation on the Domain Workbench.

No Comments | Tags: Favorites, Management

1 December 2008 - 13:23Various takes on SOA

I have avoided reading and writing about SOA lately because SOA has attained buzzword status and the high number of articles written about it makes it pretty difficult to follow. I also see so much repetition of the same terms and concepts that pretty much every article I have read in the last 4 months feels stale once I go over the first 2 lines.

Against this background I have read 2 interesting articles on SOA: one article by Martin Fowler about using open-source techniques in order to implement collaboration between a consumer service and a supplier service and one article about abandoning the mindset which focuses on IT and rather on focusing on the relationships between various services thru contracts and policies (contracts and policies are organizational issues and not technology issues).

I found Martin Fowler’s article refreshing, even though I have to say that I disagree with it. From what I understand from the article. I think that having developers commit code to a service that you need to connect to has high interaction costs (basically the developers need to understand the partner service, need to be coordinated with the rest of developers, etc…), interaction costs which drive down the number of services with which you can partner effectively. Martin Fowler takes the issue of partnering with a service even further by having developers from the service consumer become full rights committers to the codebase of the partner service. When your developers are committers to a different service you have to wonder on what service are these developers working. The service consumers and the service providers are becoming very tightly coupled, not thru the interfaces they use for communicating (REST, WS-*, wire protocols, etc…) but thru the developers that these 2 services are sharing. I would say that Martin Fowler open-source like approach to service interactions does not scale to a larger number of services. I could go a bit into the differences between open-source projects and your typical enterprise application (I think that the most important one is that on an open source project features are chosen by boiling down competing features to the lowest common denominator chosen by the OS project participants while your typical enterprise application chooses its features in order to boost its revenues), differences which will pretty much tell why OS-like projects are very rare in the enterprise computing world but I don’t really have the time.

I have to say that when comparing the above OS approach to SOA to the typical vendor approach to  SOA you see big differences: a typical vendor assumes that services are out there in a registry and that a business analyst can create new processes and new value by drag-and-dropping various services into an IDE storing references to firm-wide services. This vision does not have much in common with the real world for two important reasons: 1) there are interactions costs associated with consuming each service, interaction costs which are not expressed in a service IDE and 2) in order to manipulate services at a firm-wide scope you need to master knowledge at the same scope. Possessing knowledge at this scope is nearly impossible, I personally have seen rather the reverse: business analysts getting so specialized that their scope becomes more and more focused. As far as I see it in order to achieve the SOA-nirvana championed by your typical IT vendor you need significant disruptions in your work-force.
And this brings me to the MWD article which echoes some thoughts that I wrote a while back when I was saying that interactions between IT services will be driven by the interations between the BUs implementing these services. SOA is not about empowering some business analyst to draw some funny diagrams on a board and re-engineer information flows at the firm level, but rather about how you create relationships and interactions, how you create incentives for adhering to rule and penalties for straying away from them, how you delegate managing an interaction to the parties involved in it, how you identify and consolidate common tasks into shared services and how you allocate tasks which are particular to a small set of parties, how you deal with who loses in these shifts, etc…

As you can see, all these concerns don’t have much to do with IT, being first and foremost management concerns. The only thing that IT can do is help these management issue get resolved, but not resolve them. IT brings some capabilities which can be used: service registries (which consolidate the view of and access to IT assets), ESBs (which allow for controlling data flows), policy standards (which allow the automation of defining and enforcing policies), etc… These capabilities should not be mistaken for their usage or their ends.
Ultimately, all the IT issues on SOA will be pushed to the back and management will take over. Ideally this field will become so specialized and its audience so narrow that we will get rid of another tiresome buzzword…

Later Edit: As you can guess this post has been written in about half a dozen different sessions, each sessions adding a few more lines to what was written before. I don’t have much time to write anymore (and I guess it shows), but I felt compelled to comment on those 2 articles because I thought they stand out from the rest of articles written on SOA.

No Comments | Tags: Favorites, Management

3 November 2008 - 14:24Task allocation and designing interactions

As I was saying in my previous post a lot of technological issues (such as the debate of REST vs. WS-*) are actually organizational issues and that at the bottom of them you will find that the process of communication has been broken down into tasks (namely the task of  the payload-to-processor mapping) and those tasks have been allocated (they have been externalized or internalized) according to the transaction costs of each particular environment.

I would follow this by saying that a lot of time will be saved if the interactions between various parties will be broken down into tasks and that the tasks will be allocated according to the nature of the organization(s) within which these interactions will be carried out. This applies not only message passing but to any other environment in which various entities engage in collaboration. 
For example, you may find out that DSLs will not work in particular setting not because DSLs are bad, but because the organization within which these DSLs are implemented and used simply doesn’t warrant their success.

No Comments | Tags: Favorites, Management

16 October 2008 - 19:31REST and WS - part 5

One of the best presentations that I have seen lately is Mark Little talking about REST and WS at QCon London 2008, or rather about the differences between the uniform interfaces and the specific interface. Mark Little gets right at the bottom of the differences between REST and WS, and these differences are not all differences between architectural styles, but rather differences in organizations.

One of the best interpretations of the differences in REST vs WS that I have seen is the one that Mark makes in this presentation and that I will try to reproduce below is the division of tasks of mapping a message sent to an entity (could be a service, could be an URL) and the component/processor within that entity that will process it: when you send a message to a specific interface you will be specifying the method and the component that will process that message, so the message-to-processor mapping is done by the client. Whereas when you send a message to a specific interface you need to attach some meta-data to that message which will be used by the entity implementing the specific interface for mapping that message to its appropriate processor, therefore the message-to-processor mapping is done by the entity.

The message-to-processor mapping is externalized in the case of specific interfaces while the uniform interface internalizes it. These different mappings come with different transaction costs: in the case of the external mapping the costs are synchronizing the client with changes in the specific interface while in the case of the internal mapping the costs are adapting the uniform interface’s mapper to changes in the meta-data (a new type of messages should be mapped to a new processor).
Now, if we apply economics to the task of message-to-processor mapping we would start to get some interesting results. According to Coase’s theorem you can externalize an activity if the transaction costs for that carrying that activity are low enough and if the transaction costs are high then that activity is better internalized. It would follow that when the transaction costs of this mapping are low you can externalize it via a specific interface, otherwise you would be better off to internalize it via an uniform interface. 
The transaction costs of the specific interface are the synchronization of distributed clients and this could be resolved either thru versioning, or by not modifying the specific interface or by standards. If you can keep these costs down it follows that you can delegate the message-to-processor mapping efficiently to the client else you are better off with a uniform interface. Keeping these costs down also implies that the number of methods that get exposed is kept down in order to avoid the number of changes (more methods, more opportunities for changing them). Generally speaking interaction via specific interfaces is best limited to a few coarse-grained components, between a few participants and with components which do not change very often.
The above synchronization costs run counter to scale in number of clients: as the number of clients starts to grow you are starting to have problems synchronizing all of them and eventually the transaction costs of using the specific interface will become so high, it will not make sense to use it. One corollary of this is that the uniform interface is far better at the scale of the internet and this is probably the main reason why it is used at this scale.
The above will provide a pretty good guide to an architect trying to figure out whether to use a specific interface or an uniform one because their use is not driven by technology, but rather by the organization(s) within which they are employed.

Another interesting point in his presentation was the effort for creating a REST-ful standard for carrying out distributed transactions and the need for agreement between multiple parties. As I was saying in a previous post about REST and WS the differences between these 2 camps are more of organizational nature than anything else. Mark touched upon the fact that even if the need for carrying out distributed transactions via REST exists there is still no standard for doing so, that the REST community seems pretty much happy with implementing low-level infrastructure and not with higher-level, and higher-value, components. I find this a pity and I have the impression that this need will be fulfilled only it raises to a level at which major corporations will find profitable for getting involved. The REST community appears simply way too fragmented to be able to carry out this effort on its own…

Anyway, go ahead and watch the presentation, it is the most thoughtful presentation on distributed computing that I have seen in a while.

No Comments | Tags: Econo-computing, Favorites, Management

10 June 2008 - 12:20Grid design patterns - de-normalized data

I am watching with a great deal of interest developments in the grid and large scale computing environment because I always found distributed computing interesting.

One very interesting thing that I come across pretty often is the fact that in large scale computing data tends to get de-normalized, basically it grows so large that you cannot fit the entities and the relationships between entities in the same database (this infoQ article is pretty good).
Let me give an example: suppose that you have table of users and a table or messages between users having these columns:
users: userID int, firstName String, lastName String, email String
user_messages: fromUserID int, toUserID int, messageTest String.
In the initial design a message from user A to user B would have only one record in user_messages and each user will be able to get the messages it has sent and the messages it has received from this table.
Now let’s say that the number of users sky-rockets to the point where you need to partition your database horizontally into a series of identical replicas. The problem that you will face now is where to store the data that is shared between these replicas, namely the table user_messages. You cannot store it either in the replica hosting user A’s data, neither in the replica hosting user B’s data and neither in its own instance because it will grow too large and because you will need to carry out a join over 2 physically remote databases. The solution is to drop user_messages for received_user_messages (from_user_name, from_user_email, message_text) and sent_user_messages (to_user_name, to_user_email, message_text) which will store the messages that a user has sent and the ones that it had received. Each database replica will have its own instance of these 2 tables each mapped to the user table. As you can see we are not storing the userID in the message tables, but rather the user information which was previously retrieved via joins.
Sending a message from user A to user B in this environment typically involves updating 2 tables in 2 separate database instances which usually is treated as a distributed transaction. However, you cannot carry out this distributed transaction because it is very costly so sometimes you resort to updating one table and sending a message to update the second table asynchronously (the overhead for delivering a JMS message is way less than the overhead for locking rows in the second database). In the case of updating the second table asynchronously you are still enforcing the relationships between the user table and the user message tables, but at a later point in time.
De-normalized data comes with synchronization costs: when you are updating the user information on the users table you will need to propagate the changes to received_user_messages and sent_user_messages so that the data stored in these tables will be up-to-date. This could be done via asynchronous processing as well, sending a message about data being changed on an ESB and having the concerned parties listening for and processing it. Synchronization costs should be watched very carefully because they could spawn a very high number of messages. Ideally we would synchronize data which is updated rarely (such as user information) in order to keep the number of synchronization messages down. Carrying out synchronization procedures in batches could be a way to deal with a large number of synchronization messages with the side effect that it may increase the latency with which the synchronization takes place.

Large scale computing typically seems to resolve around asynchronous processing (because in transactions message passing is cheaper than database access) and de-normalized data across with the overhead implied by it. The relationships previously enforced by normalizing data are usually enforced by passing JMS messages which carry out the data changes asynchronously. This design is driven primarily by the large volume of data which cannot be serviced by only one database and by the costs of carrying out a distributed transaction spanning 2 databases.
This is one design that I see emerging for large scale computing: de-normalized data along with asynchronous processes which are enforcing the relationships between entities via message passing.

It remains to be seen if this type of grid architecture will prevail in the future. Working against it is the emergence of multi-core processors which would allow for scaling up cheaply as envisaged by Brian Goetz in this article. If chips with hundreds of multi-cores and with large amounts of memory become reality scaling out could end-up costing more than scaling up and all the above could become history pretty fast. Scaling out will continue to make sense in some environments with humongous amounts of data (think Amazon, Google, eBay, etc…) but for the current fastest growing segment in grid applications, typically applications processing moderately large amounts of data, it may make sense to scale up once chips with a large number of multi-cores become a reality.

To watch…

P.S. These shifts in data processing (moving to a grid-type of architecture in order to accommodate the increase in data and then back to a single-box architecture because of the appearance of chip with a large number of cores) makes a pretty good case for abstraction in order to lower the costs of transition from one architecture to another. Ideally the architecture and the environment in which an application runs should not affect the business logic of that application. One way to insulate your application’s business logic from these infrastructure issues is by abstraction.

No Comments | Tags: Development, Favorites

17 March 2008 - 17:22What is missing in OS testing tools

I was watching this infoQ presentation by Alexandru Popescu and Cedric Beust on testing when I realized that a big market for testing is seriously neglected.

Regarding my use of testing I would say that JUnit pretty much fills my bill and I don’t see a need to move to TestNG. I was watching the presentation not so much as to know more about TestNG, but to get some exposure to the market for testing products (*). So the features which were presented and which were said to be in a high demand from the users were the ability to define test groups, to test data connectivity and to define dependencies between tests (**). Pretty fair, I would take this functionality to be a logical progression from the ability to run a bunch of tests at random as JUnit lets you do it, you are basically starting to look at the tests you run from a higher level and you start looking for ways to aggregate these tests into higher-level constructs, such as test groups, which can then be manipulated according to various needs.
So far I would say that Test NG is looking at ways to manage the complexity coming out of a high number of tests. Test NG is for test-heavy shops, where testing is considered a concern on par or close to development. Not a bad thing and I pretty sure that TestNG covers some functionality which is in high demand by Java developers.

There is, however, one need for testing that so far it is largely unsatisfied and sorely missed: the need to test workflows or series of events. Let’s say that you have an application that is a series of MDBs, each accepting messages, transforming them and then outputting them to the next MDBs. You would want to be able to test this application end-to-end.
Let’s say that you have an application that receives market trades, needs to process them and then transform them in order to send them downstream to tax systems, settlement systems, etc… You would want to put thru a trade, set-up in a certain way and then trace its execution thru these flows and determine if, in what stage and in what shape has this trade reached the Application-to-Tax-System gateway.
When I think about work-flow testing I usually think about defining a message, inputting this message into a work-flow engine and then defining interceptors to see how the original message has been manipulated at various stages. You may need to test both the work-flow itself (to see if the message has been going thru the work-flow that it needed to go thru), the end-result (to see where the message has been forwarded to and in what shape) as well as how it behaves at various stages in this workflow (if needed). I see work-flow testing primarily concerned with interception of messages (and this would probably be a great use of AOP) and with the possibility to correlate messages passing thru this work-flow with the original message.

OS testing tools so far are limited to synchronous testing. I wonder at what point will the need for work-flow testing become so pressing and the demand for it so great so that one OS shop will start doing something about implementing a testing framework for testing workflows. Then we could use Event-Driven-Testing for testing an application written in a Event-Driven-Architecture manner…

* I know that trying to form an opinion on a market such as the market for OS testing tools from a vendor presentation is a pretty risky business, what you get from vendor presentations is usually distorted because of bias and time constraints but I will assume that this presentation gave a fair image of what users want from testing tools.

** From what I know TestNG is a lot more than some annotations that give you the possibility to form test-groups and test dependencies. However, I would say that these issues are probably considered more important and more aligned with the market for OS testing tools since they were the ones which were included in this presentation.

P.S. If I were to choose one enterprise concern which is not addressed by the Spring stack I would choose work-flow. Spring has modules for integration, batch, transaction management, security management, connectivity to various end-points, etc…, it is missing this capability. From what I see work-flows are used pretty heavily in the enterprise space and for now I think they are mostly implemented either by in-house solutions or by commercial solutions. OS could probably make a contribution in this space as well. And if it does it should probably try to give the developer the ability to test work-flows, or at least make it easy for him/her.

P.P.S. The comments by Alan Keefer on this thread reinforce my beliefs that we need to carry out tests at every level, from high-level to unit test. For a work-flow based system this would mean that we need to test the work-flows themselves and not only the units making them up.

Later edit:  There are actually a few OS workflow tools. OSWorkflow, the ones at the bottom of this article and probably a few others. I should try them at one point.

No Comments | Tags: AOP, Development, Favorites

1 February 2008 - 20:56Asynchronous processing and OOP

I was reading this post on infoq about how good object-oriented programming mimicks a lot asynchronous processing. It is a pretty interesting post, even though it is not very organized, the author surfs around a few concepts, finds some interesting relationships between them and then ends the article.

The post looks for similarities between implementing some piece of functionality in an asynchronous manner and implementing the same piece of functionality with methods that do not return values. It pretty much finds these similarities, but there are some important differences which we will explore now.
Let’s take a look at some code: suppose the you have this purchase order component which has the method placeOrder for placing an order:
public PurchaseOrderBean{

public placeOrder(PurchaseOrder po) throws PurchaseOrderException{

try{

// update its status in the DB

changePurchaseOrderStatus(po);

// execute payment

paymentBean.performPayment(po);

// update the inventory

inventoryBean.updateInventory(po);

// integrate with the back office

backOfficeBean.notifyBackOffice(po);

}catch(BackOfficeException ex){

// handle BackOfficeException, possibly rolling-back the current transaction

}catch(InventoryException ex){

// handle InventoryException, possibly rolling-back the current transaction

}catch(PaymentException ex){

// handle PaymentException, possibly rolling-back the current transaction

}

}

}

As you can see in this case the method placeOrder simply tells the inventory bean, the payment bean and the back-office bean to do their work.

Now let’s implement the same method in an asynchronous manner using JMS and QueueSenders:
public PurchaseOrderBean{

public placeOrder(PurchaseOrder po) throws PurchaseOrderException{

try{

// update its status in the DB

changePurchaseOrderStatus(po);

// execute payment

queueSenderToPaymentJMSQueue.send(formatMessage(po));

// update the inventory

queueSenderToInventoryJMSQueue.send(formatMessage(po));

// integrate with the back office

queueSenderToBackOfficeJMSQueue.send(formatMessage(po));

}catch(JMS exception){

// handle JMS Exception. Usually it means rolling-back the transaction and sending this exception up-stream.

}

}

}

As you can see the biggest difference between the 2 implementations is exception handling. In both cases you are telling a component (in the first case a bean, in the second case a different system) to carry out its work but in the first case you get to deal with the exceptions. One big difference between asynchronous processing and synchronous processing that I see is that in the case of asynchronous processing you are delegating the whole implementation of a particular functionality including failures while in synchronous processing you can delegate dealing with failure to the method that orchestrates the interactions between these components.
I consider this difference to be important when designing systems because it essentially splits tasks and responsabilities.
The post was right on the money on the need to try to avoid methods returning values which need to be dealt with in the calling methods, because coding in such a way encourages to write components to which you can delegate a task better. Having to deal with methods that return values means basically that you need to interpret these values outside of this method and this means that not everything was delegated to the method. Sometimes you need to code this way though…

Now let’s dig a bit into the way exceptions are handled in this 2 pieces of code. Let’s suppose that the payment fails for this particular purchase order, the credit card is expired. In the first example all processing would basically stop at this step, we would roll-back the transaction and notify this error upstream.
In the second example it is a bit more complicated, because the payment would fail in the payment system which would then have to notify the back-office, the inventory system that that particular purchase order has failed (probably the most efficient way to do this would be to publish a message about a failed payment on an ESB channel to which the other 2 systems are listening. Connecting the payment system directly to the systems it is related to is a tight-coupling and should be avoided. *).
Exception handling is done differently in the 2 implementations: the first implementation it deals with the exception right at the source while the second implements exception handling by passing messages between the systems in the back-ground. The second implementation is also more prone to tight-coupling between systems.

One cost involved in the above implementations is the cost of exception handling . In synchronous processing exception handling is done upfront and it is usually not expensive since you are handling it at the source while in asynchronous processing exception handling usually ocurs behind the scenes with systems passing each other messages, you should pay attention to it when designing systems. If you could delegate the whole processing, including failure handling to a component it would probably make sense to use asynchronous processing. If you find that some failures imply multiple systems it would probably make sense to use synchronous processing.

Another cost involved in the above example is the cost of locking down resources: in the synchronous example this cost is pretty high: you need to start a transaction, start processing, lock down database rows, etc… and roll back that transaction in the event of failure while in the second example this cost is extremely low: you start a transactions, you send some messages and you commit the transactions.
You essentially have these 2 costs and each implementation has a different cost: the asynchronous implementation has a high complexity cost (passing messages about exceptions behind the scenes) while the synchronous implementation has a high physical cost (the cost of locking down resources pending a transaction commit). These cost structure should be kept in mind when designing systems (**).

I would end this post saying that while most of the people seem to associate asynchronous processing with low-latency method calls, delayed execution and relaxed time constraints it is important to bear in mind that asynchronous processing should also be associated with the way you delegate a particular task to a particular system which is primarily a management issue. What I wanted to stress is that asynchronous messaging forces you to fully delegate processing (including exception handling) to a different component or system. It is this hand-off that I see as just as important as the other things that asynchronous is associated with: faster return from the call because you are simply passing a message rather than waiting for its execution,etc….

* One caveat though. It is a good thing to try to keep the number of the message types that are sent thru an ESB under control, otherwise you will end-up with a Spaghetti Oriented Architecture ;-).

** The present push away from ACID transactions towards systems passing messages between each other would imply that the physical cost is larger than the complexity cost. At the same time ESBs are becoming more and more popular in order to keep the cost of the relationships between various systems low.
Transactions can be viewed as a way to enforce state synchronization between multiple actors: these actors try to carry out a series of actions at the same time and if one of them fails they should all go back to original state. ACID transactions are enforcing state synchronization at the time the actions are carried out, however in some cases you can carry out this state synchronization later.
Please check these links for some unusual transaction mechanisms. Some of them (especially XTP) will very likely become de-facto standards:
Are Cross-Service Transactions A Violation of the Autonomous Tenet of Service Orientation?
Xtreme Transaction Processing under SBA terminology
Large-scale distributed transactions
Constrained Tree Schemas (CTS) and applications (CTA) for extreme OLTP (XTP)

Note: This post was written while listening to Mylène Farmer live at Bercy 2006.

P.S. This post is also quite all over the place: exception handling in async interactions vs. synchronous processing, transactions, etc… I guess Mylène Farmer can be quite distracting ;-).

No Comments | Tags: Favorites, Management

17 January 2008 - 19:47Java now and in the future

If you are to read most of what is published today about the Java platform it seems that the future of Java doesn’t look pretty good at the moment as it keeps losing battle after battle with the movement behind dynamic languages such as Ruby and PHP. At the same time improvements to the language which could give it some boost seem to be badly implemented, most people beeing unhappy with the implementation of generics and the proposed closure enhancement looking pretty horrific.

You have to wonder what will happen to Java in the future. Will it disappear and turn into a dinosaur that could not adapt to a changing IT environment or it will manage to survive the problems that it has at the time? One good look at the Java language would have to consider both the language itself and the libraries/frameworks built under it, the contributors to the language, both contributors to the language features and to its libraries/frameworks, etc… as well as the users of the language and its libraries, i.e. the developers that create Java applications.

The different types of contributors to the Java language. I would start by looking at the contributors and I would split them in 2 camps: corporate contributors and non-corporate contributors (*). Looking at the chief contributions these 2 types of contributors make to the Java platform I would say that the corporate contributors are mostly active contributing libraries and frameworks and that the non corporate contributors are contributing Java language variants such as Groovy and Scala (**). The split between contributors seems to mirror the split between the language itself and the libraries/frameworks being built under it, this is an important point.

It is important to look at the motivations of each contributor type for contributing to the Java platform: the corporate entities are mostly contributing libraries and frameworks that target specific problems with a wide audience while the non-corporate entities typically target problems with small audiences, such as languages. The size of the entities involved in a particular task is usually a good indicator of its audience and of the need for coordination (the bigger the number of entities, the bigger the need for coordination).
Corporate entities are more effective in projects where a certain amount of discipline and coordination is required (such as when defining the WS-* specs), while non-corporate entities are more effective in projects which do not require a large amount of coordination. The need for coordination between corporate entities is primarily driven by the number of these entities and the fact that these entities have different, sometimes competing, interests to which a common denominator has to be found.
Opposite to the need of coordination you would find most entities that created the languages currently running on the Java platform: These languages are initiated by individuals and are maintained by a single team that is usually pretty small.

One important thing about the contribution of corporate entities to the Java language in terms of libraries/frameworks is that these corporate entities need these huge investments in libraries/frameworks to be relevant in the future. Backward compatibility is very much needed for them in order to provide the stability required for continuing to make these contributions, this is very important to consider when thinking about truncating Java (see small section below) and when making language changes.

The mushrooming of JVM languages (***). You would expect that once the JRE got modified to allow easier access to different languages on the Java platform that the number of Java variants will grow very large, to the point of becoming unmanageable. This has not happened so far and if I were to identify the reasons for this I would first say that 1) there is an entry barrier for creating a language, you need to design it well and 2) the need to keep relevant investments made when committing to a particular language. The small number of languages used on the Java platform, in sharp contrast to the number of Lisp variants, is due to the fact that when a entity adopts a such a language it will make a commitment to use it in order to keep development costs down (you don’t want your web development shop to work 10 different languages, each needing a guru). The need to keep development costs down, which doesn’t exist in Lisp’s world because Lisp is used by academia rather than by everyday coders, put a ceiling over the number of languages running on the Java platform. I would expect this pressure on the number of languages running on the Java platform to exist in the future, new JVM languages will start being used only if they truly offer gains(****).

Open sourcing Java. Just like the mushrooming of Java languages the mushrooming of Java-like languages which branch out of the main Java language maintained by Sun didn’t happen (looking at the hundreds of Linux variants you would have thought that it should happen). I think that this is due to the exactly same thing that prevented the mushrooming of JVM languages - the commitment that using a language entails.

Truncating Java. The JRE got too big and even with today’s networks it is still a pain to download and install it. Split it up in OSGi bundles and spin them off the runtime (with Java AWT and Swing being the first victims). Keep in the runtime only what is absolutely necessary for the current enterprise libraries to work well (the collections library, threads, etc…).

Competition from Ruby. Ruby appears as a serious contender to Java, supposedly gaining market stare and mind-share. I don’t think that this will last long primarily because Ruby will not have resources for re-creating the current Java libraries. I also think that re-creating Java’s libraries in a different language is a waste of time. My opinion is that it is beneficial to learn Ruby in order to use JRuby and tap into the vast libraries currently available on the Java platform.

My opinion on Java’s present and future. I think that the current split in contribution to the Java platform (corporate contributors generating libraries, frameworks, products, etc… and the non-corporate contributors creating languages) is correct. Let each camp go forward in its own way, the corporate entities will continue to produce the specs, libraries and frameworks that we all use and that we all need to be relevant in the future while the non-corporate entities will work on creating new languages from which to call those libraries. There will not be a mushrooming of languages because the costs associated with using a language will keep the number of languages down. There will be some healthy competition from various languages that attain buzzword-status, but it will not last for long.
This would conclude my post. I admit that its subject (Java’s present and future) is very broad and covering it in one blog post is very hard but I have these opinions that I want to share with the world.

* By non-corporate entities I mean most small cohesive groups that grow around Java such as the groups maintaining Groovy, Scala, etc… By corporate entities I mean both large corporations involved in various specs (such as IBM, BEA, Oracle, etc…) and small corporations such as Spring Source or Red Hat. Also, the contribution of a corporate entity to the Java platform is not restricted to the contribution that entity makes in the open-source space, but it means all the code that entity has created in the Java platform, closed source as well. I know this definition borders somewhat on using the language rather than contributing to it, but I will keep it this way.

** I know that Java languages are also contributed by corporate entities with JRuby being contributed by Sun being one such example (Sun actually brought JRuby under its umbrella, JRuby being started by a non-corporate entity).

*** By JVM languages I mean languages different from Java that run on the JVM such as Grrovy, Scala, JRuby, etc…

**** Interestingly enough, Java platform languages would be a pretty good case of study for word-of-mouth advertising: some guy created Scala, another guy tried it and blogged about it, few more did the same, some other guy requested some features, etc…, before you know Scala is slowly being shoved into the spotlight.

No Comments | Tags: Development, Favorites

3 December 2007 - 18:29Transaction* costs in software development

Probably the most interesting thing that I read in quite a while is the paper “The Nature of the Firm” ** by Robert Coase which can be viewed here. Ronald Coase answers a pretty interesting question: why in some cases it makes sense to gather resources together to carry out an economic activity while in other cases it makes sense to delegate that altogether to an outside party.
The answer lays in the extra costs that lay in carrying out a transaction or, as they are known, in transaction costs. Let’s say that you want to have your own house. You have 2 choices: one is to sub-contract parts of building the house and orchestrate the activities of the companies that you have sub-contracted: the one that pours the foundation, the one that puts up the walls, the people that paint your house, etc… or you can delegate the operation to the external entity, the home builder. The costs of each choice are (at first sight) the following:
1) carrying out the orchestration on your own: paying for the company that pours the foundation, the one putting up the walls, the one putting up the roof, etc…
2) delegating this to the home builder and paying for all of the above (the home builder will have to pay the sub-contractors just like you do) + the profit of the home builder who will be doing the orchestration for you.
At first sight it appears that you should be doing all the stuff yourself and come off cheaper than by delegating this to the home builder. But apart from the costs of the subcontractors you also incur the costs of interacting with them: you need to inspect the work of each one because you don`t know how good that sub-contractor is, you need to look around for good subcontractors, you waste a lot of time negotiating with them, etc… No suprise that most people prefer to go with a builder***.

What basically The nature of the Firm says is that if the costs of carrying out the activity on your own are less than the costs of delegating this to a dedicated party you should carry out this activity on your own rather than delegate it. You can see Coase’s theorem at work in the current outsourcing/offshoring trend: the transaction costs for outsourcing some operations are so low to the fact that a corporation can delegate efficiently various tasks to outside partners.

So how would this apply to software development? I would say that transaction costs demarcate pretty well what needs to be grouped together and carried out by one entity (which assumes a specialization of some sort) and what needs to be delegated to a different party. Well, this is what is needed for architecting a project: you need to determine how to group various classes, packages, systems together so that they minimize the interactions between them. The interactions between systems (or classes or packages) can be thought of as transactions: one class calls the other to do some work just like a home-builder calls the carpenter to install the hard-wood floor. By looking at the transaction costs between various components of an application you could determine what transactions (method calls) should be consolidated into component that handles them and what transactions should be allowed to exist on their own.

I would say that one big cost associated with a method call (or call to a package/outside system) is extensive orchestration between various classes (a method that calls 15 other methods from 7 different classes). If you have a pattern of method calls repeating over your code-base you should probably try to take a step back and consolidate these calls into a component that will serve the purpose of that particular orchestration. This way you will create a specialized component to which you delegate this orchestration, rather than replicating it multiple times. Efficient specialization of the components will incur lower transaction costs because it will push down on the number of interactions between components. For this you need domain knowledge. While designing a system you should look for specialization and for ways to achieve this specialization and once this specialization achieved you should look for ways to interact with these specialized components efficiently. If this is done properly you will end up with a system in which the cost of interaction between various components is low.

Another big cost for interacting with a different system is knowledge about the other system and class. The more knowledge you need to have about an external entity the more costly it is for you to interact with that system: you will need to be kept up-to-date with changes in this knowledge, there is a ramp-up cost, you can handle only this much knowledge, etc… It is a good thing to minimize the knowledge required for interacting with an external entity, this way a component can interact more efficiently with other components.
One corollary would be that you need to externalize as little as possible from any component and try to keep most of its sub-components private. One side effect of having too much of a component available is that you run the danger of a component coupling to it unnecessarily.

One cost for interacting with an external system is API coupling: the fact that you need classes exported by that external system**** in order to connect to it. If your system shares classes with an external system you will need a way to synchronize the shared classes with the external system. This is done usually either by having the external system having different versions for its points of interaction or by having a mechanism that publishes classes used for interaction. Either alternative is pretty costly, so you may try to keep away from sharing classes with an external system (though sometimes it may be impossible).
Transaction costs can be applied in project management as well: you should look out complex interactions between teams and try to assign them to a resource best dedicated to handle them (this resource could be a team-member, a well-defined process, etc…). However, in this case you should avoid assigning this interaction to a real person because you will run into scarcity constraints (i.e. you will run out of people) very fast if you keep delegating to real people. This is why these interactions are probably best addressed to a process.

I think that developing a software project while keeping an eye on the transaction costs is a good approach.

Later Edit. Transaction costs are very useful in determining the interaction between large components (systems, services, applications) and they are probably best used for managing this interaction. When you design the interaction of various systems the interaction costs (which are transaction costs) should be kept to a minimum for that interaction to occur without many problems.

* In this post transaction will be used for its economic meaning: i.e. it will stand for an exchange between 2 parties.

** If I am not mistaken Ronald Coase received the Nobel prize in economics for his work on transaction costs and this paper.

*** Please note that there are people that actually build their own houses on their own. The vast majority of them are in the construction business or have connections in the construction business so their transaction costs are a lot lower that for the ordinary person.
On a different note, the orchestration needed for building a home (calling the people that put up the jeep-rock after the 2/4`s have been erected, calling the painter for painting the walls after the walls have been put up, etc…) is a very simple business process without any intrisic value and prone to copying. No wonder that the builder itself makes a pretty small margin while the lion`s share of the profits go to the entity which owns the land…

**** You are generally sharing classes with an external system if you interact with it via RMI or EJB. Interacting with an external system via WS-* or REST is a lighter coupling.

No Comments | Tags: Econo-computing, Favorites