24 October 2010 - 12:54Software architecture in 2010

I just finished reading this article by Jean-Jacques Dubray and I am left with the impression that the author blows out of proportions some IT developments.

The article showcases how the Massachussets Department of Transportation decided to open up a bus schedule data feed and found out that within a short period of time a number of client applications started using it. What is really interesting is that these applications provided basically the same functionality across a large number of platforms (iPhone apps, web-site, SMS and IVR systems). Basically MBTA delegated the development of client applications across multiple platforms to parties best suited for doing so.
A pretty neat division of labor emerged: the management of the data feed is carried out by MBTA while the client apps are developed by people at large.

Interesting as it may seem this development is neither new (mash-up have existed for quite a while) nor the beginning of a revolution in IT. This development is just the creation of an eco-system of applications built around MBTA’s data feed. I don’t share MBTA’s enthusiasm over it because delegating the development of client applications to benevolent third parties undermines the reliability of the bus schedules delivery. While this approach did allow them to cover a large number of client platforms without a significant investment it has also created a number of applications which seem to be nothing more that a hobby and which may be taken down without any notice. The eco-system of client apps will evolve over a certain period of time before settling down, during this period the delivery of bus schedules will be as fickle as the applications in charge of it. I am not sure that this is what MBTA had in mind when it opened its data feed to outside parties…

All in all, an interesting experiment in the relationships between various applications, but not a tipping point for the IT industry.

No Comments | Tags: Development

19 September 2010 - 12:10Domain event driven architecture

This presentation on domain event driven architecture by Stefan Norberg is a must-see presentation on loosely-coupled distributed systems. The solution of implementing cross-cutting concerns over a large number of heteregenous systems by having these systems issue domain events is one of the most elegant I have seen.

A point that I found very interesting is that systems must communicate via events and not via commands. For a system (master system A) to issue a command on another system (executing system B) it needs to know the context in which the command should be issued on system B. Requiring system A to have knowledge about the inner workings of a remote system in order to be able to carry out its tasks is typically how hard-coupling between systems starts.
The solution outlined by Stefan Norberg is to split the business logic for a specific domain into gathering data required for executing a certain task, transforming it into events and issuing events (perfomed by multiple systems), channeling the events to the system implementing the domain (typically performed by an ESB) and the implementation of the domain (performed by the system which receives the events from the ESB and uses them either for creating domain-specific commands or for creating the context in which domain-specific commands are executed). This partition of tasks scales very nicely as the number of systems and domains grow.

Another point that I found interesting was that this architecture requires the delivery of events in the order they were created. Some events are used for creating the context in which the events which are transformed into commands are running and therefore the events that create the context need to be processed before the events which are transformed into commands.
Let’s give an example. Suppose that the systems involved are a user management system, a bet processing system and a system managing user loyalty. Let’s say that we have this use case: user creates an account and then bets 5000$ on a horse race. For the sake of argument let’s assume that this use case involves the generation and dissemination of 2 events: UserCreatedEvent and UserPlacedBet event. Now let’s suppose that the system managing user loyalty receives the 2 events and is ready to process the UserPlacedBet event by checking the size of the bet and carrying out the appropriate command. Let’s say that for this command to the executed the user must exist in this system’s database. It follows that the UserCreatedEvent should have been processed before the UserPlacedBet event in order to create the context (in this case the context consists of relevant user data) in which the UserPlacedEvent could be turned into a command and executed.

A presentation which covers the use of domain events from a large number of angles (enterprise architecture, management, physical implementation) and which should be watched by anyone interested in large scale distributed systems.

No Comments | Tags: Development, Management

31 July 2010 - 16:11Domain driven development and agile methods

Watching Eric Evans explaining how to incorporate agile methods in domain driven design gives you an understanding of some of the differences and commonalities between domain driven development and agile methods.

Agile methods grew out of the fact that up-front design which involved months of analysis was getting out of synch with the world which required faster development cycles and the need to adapt to a world changing at high speed (*). However, the responsiveness of agile methods and their focus on the next iteration limited development the scope at which development is carried out and eliminated the design phase out of the development process. The result were applications for which development slowed, if not even stopped, once the application reached a certain complexity.

Eric argues that in order to fix this problem we need to bring domain driven design back into the picture, but different from the top-down, process-heavy manner in which it was conducted before. The main reason modelling an application iteratively is that the development of an application is a learning process, a process of discovering and annotating a particular domain.

Some agile techniques such as the ability of reacting swiftly to changes and to be able to perform significant changes late in the game are presented, but they seem to me to be liabilities rather than assets. Performing a significant change late in the game is a very expensive operation since it is essentially a re-organization of the concepts encapsulated in the model followed by the confusion generated by the dissemination of the new model; such things should be avoided at all costs. Rapid changes in the model are also bad since they can create confusion.

One agile practice which I think benefits domain driven design is the break-down of the design process in iterations during which new domain concepts are inserted into the model. It is very important that these iterations also make sure that the model stays in synch with the domain in order to avoid late, massive and expensive domain re-factorings. In order to do this Eric outlines a series of diagnosis measures used for detecting when the model is straying away from the model.
While these measures are necessary what is missing in this process is a way for the domain expert to validate the model. The knowledge transfers have no feed-back loop from the domain experts. The developer seems the sole owner of the model, ideally the domain expert could become more engaged in the definition of the model rather than a passive disseminator of knowledge.

Modelling brand new domains or domains in a state of change will probably always be error-prone because the development of an application in a such a domain is a discovery process for all the parties involved (from users to business analysts to developers) whose end-result is defining the domain. In such an environment probably the only way to define and disseminate domain knowledge is to set-up sessions with domain experts. However, knowledge transfers in mature domains should be handled differently in order to leverage the existing knowledge. It is a good idea to get into the domain by first reading up on it and then engaging in knowledge-transfers with domain experts.

Eric Evans covers a lot of ground in his talk from the methodology of transferring domain knowledge to diagnosing misalignments between the model and the domain, solutions to correcting such misalignments and strategies for coordinating teams working on the same project. However, I wish he would have talked a bit about the structure and skill-sets of the teams involved in domain-driven development, of the ways to disseminate domain knowledge into a team and of how to engage the domain experts in reviewing and providing feed back on the domain being developed.
All in all a very good presentation which I encourage people to watch.

* The rejection of the analysis phase by agilists could partially be explained by the fact that the agile movement started at a point where appeared brand new domains (such as e-commerce, e-marketing, content management) which were in a continous state of transformation and for which there was no prior knowledge. Analysis in this case was viewed, to a certain extent correctly, as a process which requires significant investements (in order to overcome the lack of knowledge) without a clear return (the results could be obsolete in a few months due to unforeseen changes).
At the same time the revolution in communications created the opportunity to add on stakeholders previously considered un-related to the domain. Encapsulating requirements in an application from unfoereseen stakeholders meant that the domains became more prone to change and harder to predict.
These factors contributed to the misconception that analysis hinders rather than helps.

No Comments | Tags: Development, Management

29 March 2010 - 13:31Replaying requests in flows

If you followed various presentations on Event Driven Architecture for a while you must be familiar with one advantage that many people talk about without going into detail: the ability to recover from crashes simply by re-playing events that were sent to your system. Most presentations give the impression that a flow-based system based on passing messages is born with this ability, but the reality is that it must be designed in order to implement such a functionality.

When designing such a system you first have to ask yourself if you need this ability and I would say that the answer in quite a few cases is yes. The most basic recovery from crashes for a flow-based system consists of the message broker booting up, determining what messages have to be sent and re-sending the messages to the message consumers. Chances are that re-sending the exact messages that caused the crash will cause another crash and in order to avoid this you should be able to wipe out the message store, determine what events need to be re-played and replay them in an orderly fashion till your system goes back to its normal state.

Next, you should determine how to design such a system. One way to design it would be to code the stages as idempotent operations, that is, operations which when carried out multiple times give the same results. However, sometimes the stages of the model of the system are not easily captured in idempotent operations and sometimes it is downright impossible (*).
Another way to design it would be to break the flow into stages and the stages into 2 categories: idempotent stages and non-idempotent stages (**). Next, record the requests that come in and record each stage that a request has completed successfully. For non-idempotent stages also record the state of the request after their completion. Replaying requests in such a system consists of determining what requests are in non-idempotent stages and replaying them from these stages. For example, let’s say that you have a system that accepts orders, performs matchings on them (matches buys to sells), creates fills out of these matches and sends the fills out to an outside application. This system has 4 stages: order receival, order matching, fill creating and fill forwarding. Let’s say that order receival is a non-idempotent operation, order matching is an idempotent operation, fill creation is a non-idempotent operation and fill forwarding is another non-idempotent operation. In order to design such a system for re-play you will need to track a request across all the stages, determine what requests are in non-idempotent stages and in the case you need to replay the requests to replay them from the non-idempotent stages.

Replaying requests could also help you releasing a new version of the system in which the classes of objects which are sent from one stage to another change. Typically when such a release is carried out any message in transit cannot be processed anymore because of class versioning exceptions, adding the option of replaying the requests after updating the flow with the latest release would help solve this problem.

* One example of an operation which may not be able to be made idempotent is sending messages to an external party. For this operation to be idempotent it would be necessary for the external party to be idempotent (that is, it would mean that the same message sent multiple times to the external party would have the same effect). This assumption sometimes turns out to be invalid.

** One example of a idempotent stage is a stage that performs some transformation/computation on the messages it receives and that forwards the messages to another stage. One example of a non-idempotent stage is a stage that persists data to a datastore or sends messages out to a non-idempotent external application.

No Comments | Tags: Development, Management

27 October 2009 - 13:13Non-blocking flows

Recently I was working on a business flow to which we had to add a new requirement: grouping a particular type of transactions under a file. The file had to be unique per day, it had to be created on the fly when the transaction batch starts getting processed and the transactions had to be assigned to it at the end of processing. The first solution that one could think of is to change the flow to check if the file exists (and if no then we would create it) and after this check we would assign the transactions to that trade file.

However, doing only this would pose a concurrency problem, namely that two or more transactions batches arrive at the same time when no trade file has been created yet. If each transaction batch would check if the trade file exists concurrently and try to create it, again concurrently, we could end up with duplicate trade files. One way to avoid duplicate trade files is to detect if a trade file needs to get created, allow one of the transaction batches to create the file while blocking the other transaction batches till the trade file gets created. We looked at the costs of blocking and as the costs looked pretty small (we would be blocking only once time per day when the file gets created) we decided to go ahead with blocking.

However, this approach clearly doesn’t scale, and we implemented it because it the conditions for blocking happen very rarely (as I was saying once per day) and it would not be feasible in the case of a higher amount of contention. We looked at some non-blocking alternatives and it looks like a good one would be to allow the transactions to check if trade file exists and if not then to create the trade file on the fly (without blocking) and at the end of transaction processing send further a message saying that there is a risk that some data is inconsistent (namely that there is the risk that some files have duplicates and transactions are assigned to duplicate files) and establish a procedure for repairing the transactions (if necessary). This would allow for non-blocking flows and higher thru-put, but it would come at the expense of a period of time in which data is inconsistent (in our case there is the risk that some transactions will be assigned to duplicate trade files till the duplicate trade files get fixed).

If inconsistent data is OK for the business and the rest of the application (it could be that these repair procedures as well as inconsistent data affect other parts of the application) and if blocking flows are creating significant performance problems then allowing for data to be inconsistent for a certain period of time while providing a mechanism for detection and repair of inconsistencies would probably solve the problem.

Another solution to this problem would be to detect messages which may cause blocking and create a new stage in the flow which deals with such messages.

No Comments | Tags: Development, Favorites

2 July 2009 - 16:47Key-value stores and relational databases

Relational databases are going thru a rough patch right now, some pundits going as far as writing them off. Their main problem is the fact that they are constrained to run within the same physical box (*) and scaling out is pretty hard. Once the datasets reach a certain size you will probably need to look beyond the typical relational database at other ways to store your data.

One type of alternative datastore that is getting a lot of attention is the distributed key-value store which maps values to keys and then assign keys onto multiple storage nods according to a hash function (**). In such a setting you get an object thru a key, you work with it and then you save it back in the datastore. The fact that this configuration scales out easily (***) makes this datastore very appealing, and if you are working with large datasets you will probably have no choice but to use something like it.

The transition from relational databases to key-value stores will probably include taking relations out of your application, re-modelling your application in order to group data together and streaming data to a reporting database. Relational databases gave the user both data storage as well as relations between entities which came as both a blessing and a curse (while relations made reporting easier they also gave un-checked access to data which allowed all sorts of corners to be cut). Well, key-value stores will take relations out: data access is more local, you typically have access only to what is immediate to a particular entity and you cannot perform queries spanning your entire data model. This type of data access could turn out to be actually a good thing, because these very close relationships will force an application to have a cleaner design since you will not be able to rely on monster SQL statements for compensating for design deficiencies.

In an application which needs to handle massive amounts of data relations between data will be delegated to activities which require them (reporting is a pretty good example) and data will be streamed from the key-value stores to the relational databases where these activities take place.
All in all, a pretty good division of tasks: applications with a (hopefully) cleaner model relying on key-value stores streaming data to relational databases for reporting.

All the above doesn’t mean that relational databases will disappear, the vast majority of applications do not require to process the massive amounts of data which render your typical database unpractical. Relational databases will be around for quite a while.

* Obviously, there are database clusters (such as Oracle RACs), but they are pretty costly.

** Examples of key-value stores are Google’s Big Table, Amazon’s Dynamo and IBM’s eXtreme scale.

*** Actually, there are constraints attached to it. In order to get the best performance you need to model your application so that data access occurs within the same node in order to avoid a distributed transaction spanning across multiple nodes. Please read this article by Billy Newport from IBM for a better understanding.

No Comments | Tags: Development, Management

31 March 2009 - 20:01Assigning responsabilities

I think that this presentation by Rebecca Wirfs Brock on driving the design from responsabilities pretty much hits the nail on the head: responsabilities drive design and good design is a good separation of responsabilities. From what I have seen most of the errors in software development arise either from grey-zones in which multiple responsabilities are implemented and overlap or from responsabilities which spread across multiple components. Both are examples of incorrect mapping of responsabilities to components.

All in all, a very good presentation.

No Comments | Tags: Development

23 September 2008 - 19:34OOP inheritance

Typically OOP inheritance is used for sharing various common methods and a typical example would be something like this:
You have interface Vehicle which has sub-classes Bicycle and MotorVehicleMotorVehicle in turn has the sub-classes Car and Truck, each with its specifics. Car, Truck and Bicycle are all sub-classes of Vehicle and as such are all inheriting the same common behavior, such as some object which specifies the speed limits within which each Vehicle should operate (for example a Car should be able to handle higher speeds than a Truck or a Bicycle).
Fair enough, and this way to use OOP inheritance has its reasons for use.

However, one other reason for using inheritance that I see is for creating substitutes in order to partition code bases. In this case you are using sub-classes not because you need some shared behavior to be encapsulated in a super-class, but because you have distributed teams working on the same codebase and you need to minimize collisions between them in the codebase (a collision would be 2 or more persons working on the same file and is usually resolved by a merge in CVS). Solving conflicts thru CVS merge is a pretty error-prone process and doesn`t scale out well, so at one point it is necessary to partition development so that different teams can work on the same project without elbowing each other.
Sub-classing in this case would solve this problem by letting a developer substitute its team’s type for another team’s type.

One example is using helper classes which service a POJO and which are used by more than one team:
Let’s say that we have a TaxProcessor class that uses a TaxCalculator for calculating the taxes to be applied to a trade. Let’s say that you have 2 teams, one working on taxing bond trades and one working on taxing equity trades and that they are all using the same TaxCalculator class on the TaxProcessorTaxCalculator has the method taxTrade which delegates to taxBondTrade and taxEquityTrade. As time goes on a lot of fixes are put into taxTrade in order to deal with various corner-cases to the point where 1) TaxCalculator becomes hard to maintain and 2) more than one developer is working on the same method at the same time. A solution would be to sub-class TaxCalculator into BondTaxCalculator and EquityTaxCalculator, have each implement the method taxTrade according to its specifics and then have TaxProcessor use the appropriate class according to the trade type.

As you can see from above good software models imply good separation of concerns. Good separation of concerns translates in good separation between development teams working in parallel on the same project. Good separation between development teams results in low transaction costs or interaction costs between teams and into a more efficient way to delegate work and in a development environment which scales better.

No Comments | Tags: Development, Management

28 August 2008 - 13:13Technical debt

You are probably familiar with Martin Fowler’s Technical Debt, a metaphor around the idea that doing things the quick-and-dirty way creates bad code, bad code which can be viewed as a debt which needs to be paid later in installments (the principal is the re-factoring of the bad code and the installments are the extra effort that this bad code forces on development). The concept was originally set-up by Ward Cunninghan.

Technical Debt is pretty much different from ordinary, everyday debt primarily because there is no creditor and no maturity date. Also, unlike regular debt, it is very hard to quantify. Technical Debt seems a bastardized version of regular debt in the sense that some development costs are identified as the principal (the re-factoring of the bad code) and some as the interest (the hacks used for dealing with this bad code) and it is built by taking some concepts out of the ordinary debt concept as the author sees appropriate in order to underline some development costs.
At the same time Tech Debt is defined exclusively from the point of view of the borrower who has to worry about paying the principal and the interest leaving out other actors (even though once you take the creditor out of the picture you have to wonder why you have to pay this debt at all ;-)). But, like Martin Fowler said at the beginning of his article, this is a metaphor, which leaves a lot of room for stretching various concepts. Fair enough.

However, Technical Debt seems to have been taken up and spawned quite a few siblings as you can see in this infoq article: liquid assets, moral hazard, fertile assets (???) etc…, which are either unfortunate enough to bear little or no resemblance to the original economic concepts they try to refer to or unfortunate enough to not refer to anything (fertile assets stands out as a prime example).
Let’s pick for example “Liquid Asset” which is defined this way: Perhaps the term “technical debt” focuses us on the wrong things; maybe focusing on the converse, on the investment side of things might be more effective.
First of all, a real-world definition: a liquid asset is an asset for which transaction costs are lower and for which there exists a market in which this asset can be sold and bought for a reasonable fee. Try to look into the above definition (or in this link which is provided next to it in the infoq example) and if you can spot the asset that is referred to, as well as the market in which this asset is sold and bought and the fees for these transactions please let me know.
Second, switching terms in order to motivate people displays how shallow these concepts are to begin with: if Technical Debt would be a concept that relates to somethings tangible in the real world, if it would manage to encapsulate a real problem or some real costs then Tech Debt should should not be swept under the rug for fear of under-mining the developers’ morale, but rather dealt with up-front because in doing so you would solve a real problem. The fact that Tech Debt is used the way this way points to the fact that it is so vaguely defined that few people can extract some somthing valuable out of it and that it can be stretched in any direction you want to.

What I see when I am reading the infoq article mentioned above are the first steps in the attempt to “marry” 2 fields which are pretty different one from another: economics and writing code and the excesses which appear when interest start to pick up in some vague concept (excesses similar to the introduction of the SOA concept). There is a desire to bring IT development under control and one way to look at it is to define costs and benefits associated with various actions and then apply economics to these costs and benefits in order to write code more efficiently. I don’t know where this will lead because we are at the beginning of exploring this.

As far as I am concerned I have developed an interest in economics a while ago and I read on it as much as I can. I also tried to “marry” the domains of writing code with and of economics, I even have an Econo-computing category, but I find it pretty hard. So far the only economic concept I found that can be applied pretty well to software development are transaction costs because they actually encapsulate pretty well some very real costs which arise when various entities are interacting one with another.

I will be following this Econo-computing field with a lot of interest and, who knows, maybe the concepts in this field will actually start to relate to something tangigle in the real world and will improve developer productivity.

No Comments | Tags: Development, Econo-computing

1 August 2008 - 23:12EDA, clustered caches and triggers

A while ago I went to a presentation where the speaker was talking about the migration of an application from a typical MDB-based work-flow application to an EDA-based application. One of the drivers of this migration was the fact that their application`s data has grown so large that the database storing it was having performance problems. One of the solutions which were studied was to move the data tier into a clustered, transactional cache which would update the database asynchronously, essentially moving the database to a simple store of data on which you could run reports.

Not a bad idea, the data would reside in this clustered cache and it would be made available to the application. One other thing that was considered was turning the whole architecture on its head and turn from the work-flow application into an EDA application. This is how it was supposed to come: the clustered cache came along with the capability of adding events to data as data was handled (inserts, updates, deletes, etc…). This would mean that while you previously had to have an MDB that was listening for a stock-quote request you could now code a stock quote object which you would throw in this clustered cache and which could listen for events. The clustered cache would have both flushed the stock quote objects to a database and would have call events on it. The stock quote object would in turn update other objects (let`s say positions for that stock) and while these positions would get updated they would fire more events in turn. A part of the business logic would be stored in these events and in the message passing that occurs between them, in a sort of event-driven-architecture.

The more I think about this, the more I get the impression that these events are nothing more than triggers in a database (the clustered cache taking on the role of the database). And coding your business logic in triggers is not a good thing if you listen to people with experience in database programming. Relationships between various entities and the interactions between them should probably be modeled at a higher level than at object level, when you are coding business logic in triggers you are essentially limited to that object’s scope.

Now, the idea of putting the business logic into these triggers comes from a pretty hard problem: the fact that for most artchitectures out there the data store is physically away from the code that is implementing the business logic, the database which holds the data and which needs to be consulted by the business-logic tier for data. Bridging this physical distance and the performance problems that come with it was solved in various ways: pushing the business logic into the database in the form of stored procedure, addind various caching mechanisms so that data is pushed towards the middle-tier, etc… In the example above part of the reasons for pushing the business logic into these events was you would effectively push the business logic into the new data-tier (the clustered cache in this case which would trigger these events). I would say that this is a pretty interesting approach with some benefits, but attention should be paid to minimizing the number of events (triggers) which get implemented.
A decade of PL-SQL development would recommend this…

No Comments | Tags: Development