27 October 2009 - 13:13Non-blocking flows

Recently I was working on a business flow to which we had to add a new requirement: grouping a particular type of transactions under a file. The file had to be unique per day, it had to be created on the fly when the transaction batch starts getting processed and the transactions had to be assigned to it at the end of processing. The first solution that one could think of is to change the flow to check if the file exists (and if no then we would create it) and after this check we would assign the transactions to that trade file.

However, doing only this would pose a concurrency problem, namely that two or more transactions batches arrive at the same time when no trade file has been created yet. If each transaction batch would check if the trade file exists concurrently and try to create it, again concurrently, we could end up with duplicate trade files. One way to avoid duplicate trade files is to detect if a trade file needs to get created, allow one of the transaction batches to create the file while blocking the other transaction batches till the trade file gets created. We looked at the costs of blocking and as the costs looked pretty small (we would be blocking only once time per day when the file gets created) we decided to go ahead with blocking.

However, this approach clearly doesn’t scale, and we implemented it because it the conditions for blocking happen very rarely (as I was saying once per day) and it would not be feasible in the case of a higher amount of contention. We looked at some non-blocking alternatives and it looks like a good one would be to allow the transactions to check if trade file exists and if not then to create the trade file on the fly (without blocking) and at the end of transaction processing send further a message saying that there is a risk that some data is inconsistent (namely that there is the risk that some files have duplicates and transactions are assigned to duplicate files) and establish a procedure for repairing the transactions (if necessary). This would allow for non-blocking flows and higher thru-put, but it would come at the expense of a period of time in which data is inconsistent (in our case there is the risk that some transactions will be assigned to duplicate trade files till the duplicate trade files get fixed).

If inconsistent data is OK for the business and the rest of the application (it could be that these repair procedures as well as inconsistent data affect other parts of the application) and if blocking flows are creating significant performance problems then allowing for data to be inconsistent for a certain period of time while providing a mechanism for detection and repair of inconsistencies would probably solve the problem.

Another solution to this problem would be to detect messages which may cause blocking and create a new stage in the flow which deals with such messages.

No Comments | Tags: Development, Favorites

2 July 2009 - 16:47Key-value stores and relational databases

Relational databases are going thru a rough patch right now, some pundits going as far as writing them off. Their main problem is the fact that they are constrained to run within the same physical box (*) and scaling out is pretty hard. Once the datasets reach a certain size you will probably need to look beyond the typical relational database at other ways to store your data.

One type of alternative datastore that is getting a lot of attention is the distributed key-value store which maps values to keys and then assign keys onto multiple storage nods according to a hash function (**). In such a setting you get an object thru a key, you work with it and then you save it back in the datastore. The fact that this configuration scales out easily (***) makes this datastore very appealing, and if you are working with large datasets you will probably have no choice but to use something like it.

The transition from relational databases to key-value stores will probably include taking relations out of your application, re-modelling your application in order to group data together and streaming data to a reporting database. Relational databases gave the user both data storage as well as relations between entities which came as both a blessing and a curse (while relations made reporting easier they also gave un-checked access to data which allowed all sorts of corners to be cut). Well, key-value stores will take relations out: data access is more local, you typically have access only to what is immediate to a particular entity and you cannot perform queries spanning your entire data model. This type of data access could turn out to be actually a good thing, because these very close relationships will force an application to have a cleaner design since you will not be able to rely on monster SQL statements for compensating for design deficiencies.

In an application which needs to handle massive amounts of data relations between data will be delegated to activities which require them (reporting is a pretty good example) and data will be streamed from the key-value stores to the relational databases where these activities take place.
All in all, a pretty good division of tasks: applications with a (hopefully) cleaner model relying on key-value stores streaming data to relational databases for reporting.

All the above doesn’t mean that relational databases will disappear, the vast majority of applications do not require to process the massive amounts of data which render your typical database unpractical. Relational databases will be around for quite a while.

* Obviously, there are database clusters (such as Oracle RACs), but they are pretty costly.

** Examples of key-value stores are Google’s Big Table, Amazon’s Dynamo and IBM’s eXtreme scale.

*** Actually, there are constraints attached to it. In order to get the best performance you need to model your application so that data access occurs within the same node in order to avoid a distributed transaction spanning across multiple nodes. Please read this article by Billy Newport from IBM for a better understanding.

No Comments | Tags: Development, Management

31 March 2009 - 20:01Assigning responsabilities

I think that this presentation by Rebecca Wirfs Brock on driving the design from responsabilities pretty much hits the nail on the head: responsabilities drive design and good design is a good separation of responsabilities. From what I have seen most of the errors in software development arise either from grey-zones in which multiple responsabilities are implemented and overlap or from responsabilities which spread across multiple components. Both are examples of incorrect mapping of responsabilities to components.

All in all, a very good presentation.

No Comments | Tags: Development

23 September 2008 - 19:34OOP inheritance

Typically OOP inheritance is used for sharing various common methods and a typical example would be something like this:
You have interface Vehicle which has sub-classes Bicycle and MotorVehicleMotorVehicle in turn has the sub-classes Car and Truck, each with its specifics. Car, Truck and Bicycle are all sub-classes of Vehicle and as such are all inheriting the same common behavior, such as some object which specifies the speed limits within which each Vehicle should operate (for example a Car should be able to handle higher speeds than a Truck or a Bicycle).
Fair enough, and this way to use OOP inheritance has its reasons for use.

However, one other reason for using inheritance that I see is for creating substitutes in order to partition code bases. In this case you are using sub-classes not because you need some shared behavior to be encapsulated in a super-class, but because you have distributed teams working on the same codebase and you need to minimize collisions between them in the codebase (a collision would be 2 or more persons working on the same file and is usually resolved by a merge in CVS). Solving conflicts thru CVS merge is a pretty error-prone process and doesn`t scale out well, so at one point it is necessary to partition development so that different teams can work on the same project without elbowing each other.
Sub-classing in this case would solve this problem by letting a developer substitute its team’s type for another team’s type.

One example is using helper classes which service a POJO and which are used by more than one team:
Let’s say that we have a TaxProcessor class that uses a TaxCalculator for calculating the taxes to be applied to a trade. Let’s say that you have 2 teams, one working on taxing bond trades and one working on taxing equity trades and that they are all using the same TaxCalculator class on the TaxProcessorTaxCalculator has the method taxTrade which delegates to taxBondTrade and taxEquityTrade. As time goes on a lot of fixes are put into taxTrade in order to deal with various corner-cases to the point where 1) TaxCalculator becomes hard to maintain and 2) more than one developer is working on the same method at the same time. A solution would be to sub-class TaxCalculator into BondTaxCalculator and EquityTaxCalculator, have each implement the method taxTrade according to its specifics and then have TaxProcessor use the appropriate class according to the trade type.

As you can see from above good software models imply good separation of concerns. Good separation of concerns translates in good separation between development teams working in parallel on the same project. Good separation between development teams results in low transaction costs or interaction costs between teams and into a more efficient way to delegate work and in a development environment which scales better.

No Comments | Tags: Development, Management

28 August 2008 - 13:13Technical debt

You are probably familiar with Martin Fowler’s Technical Debt, a metaphor around the idea that doing things the quick-and-dirty way creates bad code, bad code which can be viewed as a debt which needs to be paid later in installments (the principal is the re-factoring of the bad code and the installments are the extra effort that this bad code forces on development). The concept was originally set-up by Ward Cunninghan.

Technical Debt is pretty much different from ordinary, everyday debt primarily because there is no creditor and no maturity date. Also, unlike regular debt, it is very hard to quantify. Technical Debt seems a bastardized version of regular debt in the sense that some development costs are identified as the principal (the re-factoring of the bad code) and some as the interest (the hacks used for dealing with this bad code) and it is built by taking some concepts out of the ordinary debt concept as the author sees appropriate in order to underline some development costs.
At the same time Tech Debt is defined exclusively from the point of view of the borrower who has to worry about paying the principal and the interest leaving out other actors (even though once you take the creditor out of the picture you have to wonder why you have to pay this debt at all ;-)). But, like Martin Fowler said at the beginning of his article, this is a metaphor, which leaves a lot of room for stretching various concepts. Fair enough.

However, Technical Debt seems to have been taken up and spawned quite a few siblings as you can see in this infoq article: liquid assets, moral hazard, fertile assets (???) etc…, which are either unfortunate enough to bear little or no resemblance to the original economic concepts they try to refer to or unfortunate enough to not refer to anything (fertile assets stands out as a prime example).
Let’s pick for example “Liquid Asset” which is defined this way: Perhaps the term “technical debt” focuses us on the wrong things; maybe focusing on the converse, on the investment side of things might be more effective.
First of all, a real-world definition: a liquid asset is an asset for which transaction costs are lower and for which there exists a market in which this asset can be sold and bought for a reasonable fee. Try to look into the above definition (or in this link which is provided next to it in the infoq example) and if you can spot the asset that is referred to, as well as the market in which this asset is sold and bought and the fees for these transactions please let me know.
Second, switching terms in order to motivate people displays how shallow these concepts are to begin with: if Technical Debt would be a concept that relates to somethings tangible in the real world, if it would manage to encapsulate a real problem or some real costs then Tech Debt should should not be swept under the rug for fear of under-mining the developers’ morale, but rather dealt with up-front because in doing so you would solve a real problem. The fact that Tech Debt is used the way this way points to the fact that it is so vaguely defined that few people can extract some somthing valuable out of it and that it can be stretched in any direction you want to.

What I see when I am reading the infoq article mentioned above are the first steps in the attempt to “marry” 2 fields which are pretty different one from another: economics and writing code and the excesses which appear when interest start to pick up in some vague concept (excesses similar to the introduction of the SOA concept). There is a desire to bring IT development under control and one way to look at it is to define costs and benefits associated with various actions and then apply economics to these costs and benefits in order to write code more efficiently. I don’t know where this will lead because we are at the beginning of exploring this.

As far as I am concerned I have developed an interest in economics a while ago and I read on it as much as I can. I also tried to “marry” the domains of writing code with and of economics, I even have an Econo-computing category, but I find it pretty hard. So far the only economic concept I found that can be applied pretty well to software development are transaction costs because they actually encapsulate pretty well some very real costs which arise when various entities are interacting one with another.

I will be following this Econo-computing field with a lot of interest and, who knows, maybe the concepts in this field will actually start to relate to something tangigle in the real world and will improve developer productivity.

No Comments | Tags: Development, Econo-computing

1 August 2008 - 23:12EDA, clustered caches and triggers

A while ago I went to a presentation where the speaker was talking about the migration of an application from a typical MDB-based work-flow application to an EDA-based application. One of the drivers of this migration was the fact that their application`s data has grown so large that the database storing it was having performance problems. One of the solutions which were studied was to move the data tier into a clustered, transactional cache which would update the database asynchronously, essentially moving the database to a simple store of data on which you could run reports.

Not a bad idea, the data would reside in this clustered cache and it would be made available to the application. One other thing that was considered was turning the whole architecture on its head and turn from the work-flow application into an EDA application. This is how it was supposed to come: the clustered cache came along with the capability of adding events to data as data was handled (inserts, updates, deletes, etc…). This would mean that while you previously had to have an MDB that was listening for a stock-quote request you could now code a stock quote object which you would throw in this clustered cache and which could listen for events. The clustered cache would have both flushed the stock quote objects to a database and would have call events on it. The stock quote object would in turn update other objects (let`s say positions for that stock) and while these positions would get updated they would fire more events in turn. A part of the business logic would be stored in these events and in the message passing that occurs between them, in a sort of event-driven-architecture.

The more I think about this, the more I get the impression that these events are nothing more than triggers in a database (the clustered cache taking on the role of the database). And coding your business logic in triggers is not a good thing if you listen to people with experience in database programming. Relationships between various entities and the interactions between them should probably be modeled at a higher level than at object level, when you are coding business logic in triggers you are essentially limited to that object’s scope.

Now, the idea of putting the business logic into these triggers comes from a pretty hard problem: the fact that for most artchitectures out there the data store is physically away from the code that is implementing the business logic, the database which holds the data and which needs to be consulted by the business-logic tier for data. Bridging this physical distance and the performance problems that come with it was solved in various ways: pushing the business logic into the database in the form of stored procedure, addind various caching mechanisms so that data is pushed towards the middle-tier, etc… In the example above part of the reasons for pushing the business logic into these events was you would effectively push the business logic into the new data-tier (the clustered cache in this case which would trigger these events). I would say that this is a pretty interesting approach with some benefits, but attention should be paid to minimizing the number of events (triggers) which get implemented.
A decade of PL-SQL development would recommend this…

No Comments | Tags: Development

10 June 2008 - 12:20Grid design patterns - de-normalized data

I am watching with a great deal of interest developments in the grid and large scale computing environment because I always found distributed computing interesting.

One very interesting thing that I come across pretty often is the fact that in large scale computing data tends to get de-normalized, basically it grows so large that you cannot fit the entities and the relationships between entities in the same database (this infoQ article is pretty good).
Let me give an example: suppose that you have table of users and a table or messages between users having these columns:
users: userID int, firstName String, lastName String, email String
user_messages: fromUserID int, toUserID int, messageTest String.
In the initial design a message from user A to user B would have only one record in user_messages and each user will be able to get the messages it has sent and the messages it has received from this table.
Now let’s say that the number of users sky-rockets to the point where you need to partition your database horizontally into a series of identical replicas. The problem that you will face now is where to store the data that is shared between these replicas, namely the table user_messages. You cannot store it either in the replica hosting user A’s data, neither in the replica hosting user B’s data and neither in its own instance because it will grow too large and because you will need to carry out a join over 2 physically remote databases. The solution is to drop user_messages for received_user_messages (from_user_name, from_user_email, message_text) and sent_user_messages (to_user_name, to_user_email, message_text) which will store the messages that a user has sent and the ones that it had received. Each database replica will have its own instance of these 2 tables each mapped to the user table. As you can see we are not storing the userID in the message tables, but rather the user information which was previously retrieved via joins.
Sending a message from user A to user B in this environment typically involves updating 2 tables in 2 separate database instances which usually is treated as a distributed transaction. However, you cannot carry out this distributed transaction because it is very costly so sometimes you resort to updating one table and sending a message to update the second table asynchronously (the overhead for delivering a JMS message is way less than the overhead for locking rows in the second database). In the case of updating the second table asynchronously you are still enforcing the relationships between the user table and the user message tables, but at a later point in time.
De-normalized data comes with synchronization costs: when you are updating the user information on the users table you will need to propagate the changes to received_user_messages and sent_user_messages so that the data stored in these tables will be up-to-date. This could be done via asynchronous processing as well, sending a message about data being changed on an ESB and having the concerned parties listening for and processing it. Synchronization costs should be watched very carefully because they could spawn a very high number of messages. Ideally we would synchronize data which is updated rarely (such as user information) in order to keep the number of synchronization messages down. Carrying out synchronization procedures in batches could be a way to deal with a large number of synchronization messages with the side effect that it may increase the latency with which the synchronization takes place.

Large scale computing typically seems to resolve around asynchronous processing (because in transactions message passing is cheaper than database access) and de-normalized data across with the overhead implied by it. The relationships previously enforced by normalizing data are usually enforced by passing JMS messages which carry out the data changes asynchronously. This design is driven primarily by the large volume of data which cannot be serviced by only one database and by the costs of carrying out a distributed transaction spanning 2 databases.
This is one design that I see emerging for large scale computing: de-normalized data along with asynchronous processes which are enforcing the relationships between entities via message passing.

It remains to be seen if this type of grid architecture will prevail in the future. Working against it is the emergence of multi-core processors which would allow for scaling up cheaply as envisaged by Brian Goetz in this article. If chips with hundreds of multi-cores and with large amounts of memory become reality scaling out could end-up costing more than scaling up and all the above could become history pretty fast. Scaling out will continue to make sense in some environments with humongous amounts of data (think Amazon, Google, eBay, etc…) but for the current fastest growing segment in grid applications, typically applications processing moderately large amounts of data, it may make sense to scale up once chips with a large number of multi-cores become a reality.

To watch…

P.S. These shifts in data processing (moving to a grid-type of architecture in order to accommodate the increase in data and then back to a single-box architecture because of the appearance of chip with a large number of cores) makes a pretty good case for abstraction in order to lower the costs of transition from one architecture to another. Ideally the architecture and the environment in which an application runs should not affect the business logic of that application. One way to insulate your application’s business logic from these infrastructure issues is by abstraction.

No Comments | Tags: Development, Favorites

17 March 2008 - 17:22What is missing in OS testing tools

I was watching this infoQ presentation by Alexandru Popescu and Cedric Beust on testing when I realized that a big market for testing is seriously neglected.

Regarding my use of testing I would say that JUnit pretty much fills my bill and I don’t see a need to move to TestNG. I was watching the presentation not so much as to know more about TestNG, but to get some exposure to the market for testing products (*). So the features which were presented and which were said to be in a high demand from the users were the ability to define test groups, to test data connectivity and to define dependencies between tests (**). Pretty fair, I would take this functionality to be a logical progression from the ability to run a bunch of tests at random as JUnit lets you do it, you are basically starting to look at the tests you run from a higher level and you start looking for ways to aggregate these tests into higher-level constructs, such as test groups, which can then be manipulated according to various needs.
So far I would say that Test NG is looking at ways to manage the complexity coming out of a high number of tests. Test NG is for test-heavy shops, where testing is considered a concern on par or close to development. Not a bad thing and I pretty sure that TestNG covers some functionality which is in high demand by Java developers.

There is, however, one need for testing that so far it is largely unsatisfied and sorely missed: the need to test workflows or series of events. Let’s say that you have an application that is a series of MDBs, each accepting messages, transforming them and then outputting them to the next MDBs. You would want to be able to test this application end-to-end.
Let’s say that you have an application that receives market trades, needs to process them and then transform them in order to send them downstream to tax systems, settlement systems, etc… You would want to put thru a trade, set-up in a certain way and then trace its execution thru these flows and determine if, in what stage and in what shape has this trade reached the Application-to-Tax-System gateway.
When I think about work-flow testing I usually think about defining a message, inputting this message into a work-flow engine and then defining interceptors to see how the original message has been manipulated at various stages. You may need to test both the work-flow itself (to see if the message has been going thru the work-flow that it needed to go thru), the end-result (to see where the message has been forwarded to and in what shape) as well as how it behaves at various stages in this workflow (if needed). I see work-flow testing primarily concerned with interception of messages (and this would probably be a great use of AOP) and with the possibility to correlate messages passing thru this work-flow with the original message.

OS testing tools so far are limited to synchronous testing. I wonder at what point will the need for work-flow testing become so pressing and the demand for it so great so that one OS shop will start doing something about implementing a testing framework for testing workflows. Then we could use Event-Driven-Testing for testing an application written in a Event-Driven-Architecture manner…

* I know that trying to form an opinion on a market such as the market for OS testing tools from a vendor presentation is a pretty risky business, what you get from vendor presentations is usually distorted because of bias and time constraints but I will assume that this presentation gave a fair image of what users want from testing tools.

** From what I know TestNG is a lot more than some annotations that give you the possibility to form test-groups and test dependencies. However, I would say that these issues are probably considered more important and more aligned with the market for OS testing tools since they were the ones which were included in this presentation.

P.S. If I were to choose one enterprise concern which is not addressed by the Spring stack I would choose work-flow. Spring has modules for integration, batch, transaction management, security management, connectivity to various end-points, etc…, it is missing this capability. From what I see work-flows are used pretty heavily in the enterprise space and for now I think they are mostly implemented either by in-house solutions or by commercial solutions. OS could probably make a contribution in this space as well. And if it does it should probably try to give the developer the ability to test work-flows, or at least make it easy for him/her.

P.P.S. The comments by Alan Keefer on this thread reinforce my beliefs that we need to carry out tests at every level, from high-level to unit test. For a work-flow based system this would mean that we need to test the work-flows themselves and not only the units making them up.

Later edit:  There are actually a few OS workflow tools. OSWorkflow, the ones at the bottom of this article and probably a few others. I should try them at one point.

No Comments | Tags: AOP, Development, Favorites

12 March 2008 - 1:54Pros and cons of standards

I was reading Bill Burke`s post on transaction compensation via REST and JBPM and I have to tell that I agree with most of his remarks. Bill makes a very interesting point about compensating a transaction: that the compensation itself is an activity of a business process (the activity of handling failure, very likely outside of the system involved) and that this activity could be implemented as a regular activity in a business process using jBPM. It could also be exposed to the outside world thru a REST service.

This is a very interesting take on compensations and his proposition (that compensations are regular business activities that are part of a business process and that could be coded as such) gives a lot of flexibility to handling compensations. However, is exactly this flexibility that will probably force someone away from the compensation scheme devised by Bill and back to WS-BA. As Mark pointed out you still need to enforce end-points to run the same version of the communication protocol, be it WS-BA or an ad-hoc REST-based protocol.
However, I would go further and say that the flexibility devised by Bill runs counter to market acceptance. Standards are straight jackets which people choose to wear when it makes sense, typically when it makes some process (such as interacting with a different system) more efficient. Coding a business activity in WS-BA would enable you to plug it effortlessly different systems implementing the WS-BA standard, in theory at least. You would expose your application to the outside world effortlessly and you could interact with a greater number of systems because you have chosen to trim down your application to fit into the WS-BA straight-jacket.

On the other side, if your application cannot be forced into the WS-BA standard probably it would make sense to drop this standard and expose the compensation logic as you and your partners agree. It would not be the first time that a standard gets ditched for a proprietary solution particular to a few partners and certainly not the last time. Sometimes a straight jacket is just too stifling…

No Comments | Tags: Development

22 February 2008 - 3:16DTOs are not rich beans

Of all the problems of Data Transfer Objects (DTOs) the one that stands out is the anemic domain problem: DTOs are basically objects with a few variables and getters/setters for them, lacking any complex behavior. This way of developing applications has been decried times and times again so I will not waste my time exposing this problem.

However, I started realizing that maybe DTOs should not contain any complex behavior and that they should be as dumb as dirt, especially when being used for communication between multiple systems. Consider this example: we have a tax system which uses the object TaxInformation which various systems use for getting information about taxes. Let`s say that the commodity trades system, the equity trades system and the FX trades system are all using it. Let`s say that this object looks like this:

public class TaxInformation{

private Trade trade;

private TaxReceipt taxReceipt;

public TaxInformation(Trade trade){

this.trade = trade;

this.taxReceipt = new TaxReceipt();

}

public TaxReceipt getTaxReceipt(){

return taxReceipt;

}

}

As you can see a DTO dumber than dirt. Let`s create an endpoint creating and serving such an object:

public class TaxService{

public TaxInformation getTaxInformation(Trade trade){

TaxInformation taxInfo = new TaxInformation(trade);

return taxInfo;

}

}

All is fine, but a requirement comes in which says that if the trade is restricted then the tax receipt of the tax information object related to that trade should also be restricted. Here you are presented with a few choices:

1) Implement this requirement in the TaxInformation object probably in the constructor.

2) Implement this requirement in the TaxService object.

3) Implement this requirement in the TaxReceipt object, in its constructor. For the sake of argument, let`s assume that we do not have this possibility, so I turned it off.

OOP purists will immediatelly recommend choice #1, because this would turn the DTO into a rich bean and would do away with the anemic DTO. This is a mistake, because all the systems using this object for communicating with the tax system will start having class versioning errors unless they are updated with this DTO`s new class. The problem is versioning such a DTO. Putting complex behavior in a DTO raises the risks of change in the DTO, which raises the effort of propagating this change in the systems which use this DTO for communication. At one point it makes sense to drop DTOs and use some sort of protocol for communication, preferrably a protocol which deals with change pretty easily (XML documents with schemas which always get extended and with constraints which get changed very rarely are a pretty good fit).

DTOs should probably be used in systems whose lifecycles are kept in synch (i.e. they get upgraded in synch), in such systems you can propagate type changes pretty easily because you have a certain amount of control over their lifecycles (*). Expanding on this, I would say that typed languages are probably most effective locally because type changes cannot be propagated over large distances efficiently. Type-coupling (or API coupling) is a very hard coupling and its should be avoided and replaced with protocols. DTOs do create an anemic model and should be avoided if possible, but if you choose types for communication between systems it would make sense to keep these DTOs to their most basic function of passing data around or to have procedures for propagating type changes in all the systems using types for communication.
Sometimes weak DTOs are actually a good thing…

* BTW, whether a set of systems whose lifecycles are kept in synch is actually one big system with one particular lifecycle is a pretty good question. I would say that you can think about it as a big system.

No Comments | Tags: Development