29 November 2010 - 12:39Conway’s Law

I read today a very interesting post on Dan Prichett’s blog on the way the IT architecture of a large organization is affected by the structure of the organization. For a while I have grown convinced that most of the decisions regarding IT are determined by the underlying organization, but I have not seen it stated so clearly. It is a pretty important statement because it identifies clearly what influences IT architecture the most.

However, I don’ think that the solution proposed by Dan Prichett to correcting the IT inefficiencies can be applied easily to any organization. I see the IT architecture driving the structure of the organization, rather than the reverse, happening mostly in IT-centric companies (*). I don’ see an insurance company or a distribution chain changing the way they do business solely for the purpose of enhancing the efficiency of IT. For a typical company IT is a sunk-cost rather than a strategic asset (**), restructuring a company in order to minimize this particular set of costs, possibly at the expense of other concerns, does not make much sense.
Adapting the IT infrastructure to the organization, and not the reverse, will be the norm for the foreseeable future. In most cases the best that IT can do is to try to reduce costs by automation of various tasks across the organization rather than try to define the organization.

Later Edit: Another reason why aligning the organization with the architecture (rather than the opposite) does not apply all the time is that refactoring the architecture will imply re-structuring the organization in order to align it with the new architecture. The already high costs or re-architecting the IT infrastructure may become prohibitive.

* I would call an IT-centric company a company that proposes services built around IT or that derives a  competitive advantage from its IT infrastructure. Examples of such companies are eBay, Amazon, Google, online shops, online media companies, etc… In these type of companies we may derive the organization from the architecture of IT systems because in doing so we enhance the competitive advantage of that company.

** For IT-centric companies this does not apply, in their case IT is a strategic asset.

No Comments | Tags: Management

19 September 2010 - 12:10Domain event driven architecture

This presentation on domain event driven architecture by Stefan Norberg is a must-see presentation on loosely-coupled distributed systems. The solution of implementing cross-cutting concerns over a large number of heteregenous systems by having these systems issue domain events is one of the most elegant I have seen.

A point that I found very interesting is that systems must communicate via events and not via commands. For a system (master system A) to issue a command on another system (executing system B) it needs to know the context in which the command should be issued on system B. Requiring system A to have knowledge about the inner workings of a remote system in order to be able to carry out its tasks is typically how hard-coupling between systems starts.
The solution outlined by Stefan Norberg is to split the business logic for a specific domain into gathering data required for executing a certain task, transforming it into events and issuing events (perfomed by multiple systems), channeling the events to the system implementing the domain (typically performed by an ESB) and the implementation of the domain (performed by the system which receives the events from the ESB and uses them either for creating domain-specific commands or for creating the context in which domain-specific commands are executed). This partition of tasks scales very nicely as the number of systems and domains grow.

Another point that I found interesting was that this architecture requires the delivery of events in the order they were created. Some events are used for creating the context in which the events which are transformed into commands are running and therefore the events that create the context need to be processed before the events which are transformed into commands.
Let’s give an example. Suppose that the systems involved are a user management system, a bet processing system and a system managing user loyalty. Let’s say that we have this use case: user creates an account and then bets 5000$ on a horse race. For the sake of argument let’s assume that this use case involves the generation and dissemination of 2 events: UserCreatedEvent and UserPlacedBet event. Now let’s suppose that the system managing user loyalty receives the 2 events and is ready to process the UserPlacedBet event by checking the size of the bet and carrying out the appropriate command. Let’s say that for this command to the executed the user must exist in this system’s database. It follows that the UserCreatedEvent should have been processed before the UserPlacedBet event in order to create the context (in this case the context consists of relevant user data) in which the UserPlacedEvent could be turned into a command and executed.

A presentation which covers the use of domain events from a large number of angles (enterprise architecture, management, physical implementation) and which should be watched by anyone interested in large scale distributed systems.

No Comments | Tags: Development, Management

31 July 2010 - 16:11Domain driven development and agile methods

Watching Eric Evans explaining how to incorporate agile methods in domain driven design gives you an understanding of some of the differences and commonalities between domain driven development and agile methods.

Agile methods grew out of the fact that up-front design which involved months of analysis was getting out of synch with the world which required faster development cycles and the need to adapt to a world changing at high speed (*). However, the responsiveness of agile methods and their focus on the next iteration limited development the scope at which development is carried out and eliminated the design phase out of the development process. The result were applications for which development slowed, if not even stopped, once the application reached a certain complexity.

Eric argues that in order to fix this problem we need to bring domain driven design back into the picture, but different from the top-down, process-heavy manner in which it was conducted before. The main reason modelling an application iteratively is that the development of an application is a learning process, a process of discovering and annotating a particular domain.

Some agile techniques such as the ability of reacting swiftly to changes and to be able to perform significant changes late in the game are presented, but they seem to me to be liabilities rather than assets. Performing a significant change late in the game is a very expensive operation since it is essentially a re-organization of the concepts encapsulated in the model followed by the confusion generated by the dissemination of the new model; such things should be avoided at all costs. Rapid changes in the model are also bad since they can create confusion.

One agile practice which I think benefits domain driven design is the break-down of the design process in iterations during which new domain concepts are inserted into the model. It is very important that these iterations also make sure that the model stays in synch with the domain in order to avoid late, massive and expensive domain re-factorings. In order to do this Eric outlines a series of diagnosis measures used for detecting when the model is straying away from the model.
While these measures are necessary what is missing in this process is a way for the domain expert to validate the model. The knowledge transfers have no feed-back loop from the domain experts. The developer seems the sole owner of the model, ideally the domain expert could become more engaged in the definition of the model rather than a passive disseminator of knowledge.

Modelling brand new domains or domains in a state of change will probably always be error-prone because the development of an application in a such a domain is a discovery process for all the parties involved (from users to business analysts to developers) whose end-result is defining the domain. In such an environment probably the only way to define and disseminate domain knowledge is to set-up sessions with domain experts. However, knowledge transfers in mature domains should be handled differently in order to leverage the existing knowledge. It is a good idea to get into the domain by first reading up on it and then engaging in knowledge-transfers with domain experts.

Eric Evans covers a lot of ground in his talk from the methodology of transferring domain knowledge to diagnosing misalignments between the model and the domain, solutions to correcting such misalignments and strategies for coordinating teams working on the same project. However, I wish he would have talked a bit about the structure and skill-sets of the teams involved in domain-driven development, of the ways to disseminate domain knowledge into a team and of how to engage the domain experts in reviewing and providing feed back on the domain being developed.
All in all a very good presentation which I encourage people to watch.

* The rejection of the analysis phase by agilists could partially be explained by the fact that the agile movement started at a point where appeared brand new domains (such as e-commerce, e-marketing, content management) which were in a continous state of transformation and for which there was no prior knowledge. Analysis in this case was viewed, to a certain extent correctly, as a process which requires significant investements (in order to overcome the lack of knowledge) without a clear return (the results could be obsolete in a few months due to unforeseen changes).
At the same time the revolution in communications created the opportunity to add on stakeholders previously considered un-related to the domain. Encapsulating requirements in an application from unfoereseen stakeholders meant that the domains became more prone to change and harder to predict.
These factors contributed to the misconception that analysis hinders rather than helps.

No Comments | Tags: Development, Management

24 May 2010 - 10:03Managing complexity with regression tests

In this post I will go briefly over some important issues regarding regression testing and they can be used for managing complexity that I have come across in my experience.

A batch of regression tests covering an appropriate part of an application is a very good strategy to deal with growing complexity. Furthermore, the more complex the application the higher the need for regression tests because as systems become more complex the components that make them up tend to become unwieldy (*). In a heterogeneous team with a high variation in skill sets the regression tests are probably best written and carried out by the most experienced members of the team (it goes without saying that the implementation of some functional aspect and the test covering it should be done by 2 separate persons). If the tests are written by the most experienced team members this renders them more expensive, as a result care should be taken regarding the scope of these tests. I generally consider that regression tests should cover higher-value use-cases at a higher scope and that this scope has implications for the testing infrastructure.

The tests themselves should be isolated from testing infrastructure concerns and should focus mostly on the domain that is being covered. A pretty good discussion of these items can be found here. However, the tests should also be run in an environment as close to the production environment as possible, having seen an instance where tests covering an application deployed on a legacy database were passing without even inserting a single record in the DB only to have the application have problems later in PROD because of legacy datastructures.

Manual regression tests are expensive and this high cost has some bad side effects apart from having a higher price-tag: manual regression tests tend to be executed late in the release process. Also, by the fact that they take longer (being manual) they tend to cut deeper into the safety buffer dedicated to fixing bugs found during the release process. In order to solve this problem the regression tests should be in a format which allows them to be carried out early in the development process in order to detect problems early, ideally as close as possible to the development. This would mandate that the regression tests have a high degree of automation and a certain level of embedment into the development process.

Ideally, automated regression tests should be written in a format that ensures that the tests can be manipulated with ease by both developers (in order to be able to run them at an early stage in the development process) and by domain experts/business analysts (in order to prove their validity). A solution which may satisfy these 2 different audiences could be regression tests written in the programming language of the application by experienced team members and which are read-able by domain experts (**). For complex domains the testing infrastructure may need to be adapted in order to satisfy the second point as well.

* Components become out of synch with the required functionality as the complexity that they need to cover grows. Most of the time this misalignment is due the fact that the development pace is too fast and the information for correctly re-designing components for new functionality is unavailable. As components become unwieldy the need to refactoring appears and with it the need for regression tests which are a key component in the refactoring process.

** In practice, however, it is pretty costly to create a test environment that satisfies these 2 different audiences. This is why automated regression tests are typically written in the language used for development with senior team members acting as domain experts for writing tests of validating tests written by team members.

No Comments | Tags: Management

23 April 2010 - 12:31DDD and IT management

This Infoq presentation on DDD reminded me of some opinions that I wrote a while ago about DDD and about how it can be effectively applied in a typical development environment.

From my experience there are a few obstacles to using DDD in a typical development environment and one of them is the vagueness which accompanies the definition of different artifacts (entities, value objects, etc…), vagueness which sometimes may generate conflicts (*). The same vagueness works against the size of the development team and the distance between developers (**). DDD is probably best applied in settings which minimize both the potential conflicts and additional overhead of creating and distributing new concepts: small teams of experienced developers in close geographical proximity.
DDD projects tend to exhibit a high degree of closeness between the code and the domain which is a pretty good thing. Code which binds to the domain closely tends to “document” the domain and drive down the time for making knowledge transfers.

Given that DDD is best applied at the high-end spectrum of the IT workforce it follows that it can act as a big differentiator in the productivity of development teams as it further increases the productivity of experienced developers away from the productivity of less experienced developers. However, the gains in productivity may be counter-balanced by the reduced size of the available developer pool and the higher costs and dependencies associated with it.

I would conclude with the fact that the current way of promoting DDD probably works against it. Typically DDD is portrayed as an art-form or craft available to the lucky few which can appreciate and exploit its finesse. This is a pretty good way of driving the typical IT project manager away, IT managers really disliking to rely on artisans rather than on professionals.
For DDD to be applied at a larger scale it needs to be accepted by the business and for this it needs a more “professional” image. Rather than focusing on the “beauty” of various DDD patterns the  DDD community should probably focus a lot more on how to better formulate and deliver DDD best practices to a large audience. This, or it risks to remain at the fringes of software development.

* These conflicts are usually generated when developers cannot agree on a particular design. Given that DDD pushes design right into the core of the software development process these conflicts are bound to appear more often. In order to minimize these conflicts a certain level of experience in software modeling, of communication skills and, to a certain degree, of negotiation skills is needed. All this, in return, requires experienced developers. It may also require to have a domain expert handy (which is typical of DDD projects where the domain expert is a key person in the development process) in order to proof-read the design.

** Size and geographical proximity of teams are determined by the costs of effective communication (costs which usually increase with size and distance). To these communication costs we need to add the costs of debating vague concepts, of defining them and disseminating the knowledge required for manipulating them. These additional communication costs come at the expense of the usual communication costs and require them to be reduced.

No Comments | Tags: Management

29 March 2010 - 13:31Replaying requests in flows

If you followed various presentations on Event Driven Architecture for a while you must be familiar with one advantage that many people talk about without going into detail: the ability to recover from crashes simply by re-playing events that were sent to your system. Most presentations give the impression that a flow-based system based on passing messages is born with this ability, but the reality is that it must be designed in order to implement such a functionality.

When designing such a system you first have to ask yourself if you need this ability and I would say that the answer in quite a few cases is yes. The most basic recovery from crashes for a flow-based system consists of the message broker booting up, determining what messages have to be sent and re-sending the messages to the message consumers. Chances are that re-sending the exact messages that caused the crash will cause another crash and in order to avoid this you should be able to wipe out the message store, determine what events need to be re-played and replay them in an orderly fashion till your system goes back to its normal state.

Next, you should determine how to design such a system. One way to design it would be to code the stages as idempotent operations, that is, operations which when carried out multiple times give the same results. However, sometimes the stages of the model of the system are not easily captured in idempotent operations and sometimes it is downright impossible (*).
Another way to design it would be to break the flow into stages and the stages into 2 categories: idempotent stages and non-idempotent stages (**). Next, record the requests that come in and record each stage that a request has completed successfully. For non-idempotent stages also record the state of the request after their completion. Replaying requests in such a system consists of determining what requests are in non-idempotent stages and replaying them from these stages. For example, let’s say that you have a system that accepts orders, performs matchings on them (matches buys to sells), creates fills out of these matches and sends the fills out to an outside application. This system has 4 stages: order receival, order matching, fill creating and fill forwarding. Let’s say that order receival is a non-idempotent operation, order matching is an idempotent operation, fill creation is a non-idempotent operation and fill forwarding is another non-idempotent operation. In order to design such a system for re-play you will need to track a request across all the stages, determine what requests are in non-idempotent stages and in the case you need to replay the requests to replay them from the non-idempotent stages.

Replaying requests could also help you releasing a new version of the system in which the classes of objects which are sent from one stage to another change. Typically when such a release is carried out any message in transit cannot be processed anymore because of class versioning exceptions, adding the option of replaying the requests after updating the flow with the latest release would help solve this problem.

* One example of an operation which may not be able to be made idempotent is sending messages to an external party. For this operation to be idempotent it would be necessary for the external party to be idempotent (that is, it would mean that the same message sent multiple times to the external party would have the same effect). This assumption sometimes turns out to be invalid.

** One example of a idempotent stage is a stage that performs some transformation/computation on the messages it receives and that forwards the messages to another stage. One example of a non-idempotent stage is a stage that persists data to a datastore or sends messages out to a non-idempotent external application.

No Comments | Tags: Development, Management

17 December 2009 - 17:10Replication of data and replication of functionality

From my experience I see that replication of data usually helps deal with performance problems, but that sometimes this replication involves replication of functionality, sometimes you end-up replicating some data on a different system (let’s call this the client system) and then you find yourself needing to code on the client system some piece of business logic which resides on the system from where data comes originally from (let’s call this system the master system).

More often than not this is a pretty scenario, because every modification to the business logic done on the master system has to be re-coded on the client system. Sometimes you find that you need to coordinate the releases of these 2 separate systems in order make the whole picture functionally consistent. One way to avoid re-coding the business logic on the client system is to expose the business logic on the master system thru some remoting mechanism (EJBs, Hessian, etc…) and then have the client system pass the data that it replicated locally to the master system in a synchronous call. While this solves the fact having the same business logic reside in 2 places it makes the client system dependent on the master system (*).

One solution to the above conundrum that is to replicate the data, along with the results of carrying out that piece of business logic on that data, to the client system. With this arrangement the client system does not need to have the master system up and running and it also does not need to re-code the same business logic inside. If you manage to use this (this approach doesn’t work in all cases) you will find yourself exporting business logic asynchronously by exporting both data and the results of carrying out that business logic on the data.

If I don’t write another post till January I wish you a Happy New Year!!!!!

* There is a physical dependency (the client systems needs the master system up and running for it to service calls) and possibly a library depedency which may couple the release cycles of these 2 systems (if the client system communicates with the master system thru objects rather than thry wire protocols you will need to synchronize the release cycles of these 2 systems in order to make sure that the client system doesn’t run in PROD with obsolete libraries).

No Comments | Tags: Management

26 September 2009 - 11:34When knowledge is a liability and not an asset

Typically knowledge is considered to be an asset, some going as far as saying that knowledge is power. Well, when you design interactions between systems is better to avoid systems know too much about each other. Having deep knowledge about a system with which you are integrating is both a moral hazard (it gives you the opportunity to hack your way out of problems) and creates high transaction/interaction costs (typically the costs of transferring this deep knowledge to someone else).
The same reasoning goes for interaction between modules in a single system, please check out this article

P.S. I was shocked to see the date on that article: 1971. The fact that we are still struggling with these problems shows a pretty big problem with computer science. Most people equate computer science with exotic algorithms, it would be about time to add to this equation software modelling. It is pathetic when useful concepts for writing software developed 30 years ago are still unknown to the typical developer.
Software development has a pretty big educational problem (most developers educate themselves on their own and learn on the job), it would not be a bad idea to address it.

No Comments | Tags: Management

2 July 2009 - 16:47Key-value stores and relational databases

Relational databases are going thru a rough patch right now, some pundits going as far as writing them off. Their main problem is the fact that they are constrained to run within the same physical box (*) and scaling out is pretty hard. Once the datasets reach a certain size you will probably need to look beyond the typical relational database at other ways to store your data.

One type of alternative datastore that is getting a lot of attention is the distributed key-value store which maps values to keys and then assign keys onto multiple storage nods according to a hash function (**). In such a setting you get an object thru a key, you work with it and then you save it back in the datastore. The fact that this configuration scales out easily (***) makes this datastore very appealing, and if you are working with large datasets you will probably have no choice but to use something like it.

The transition from relational databases to key-value stores will probably include taking relations out of your application, re-modelling your application in order to group data together and streaming data to a reporting database. Relational databases gave the user both data storage as well as relations between entities which came as both a blessing and a curse (while relations made reporting easier they also gave un-checked access to data which allowed all sorts of corners to be cut). Well, key-value stores will take relations out: data access is more local, you typically have access only to what is immediate to a particular entity and you cannot perform queries spanning your entire data model. This type of data access could turn out to be actually a good thing, because these very close relationships will force an application to have a cleaner design since you will not be able to rely on monster SQL statements for compensating for design deficiencies.

In an application which needs to handle massive amounts of data relations between data will be delegated to activities which require them (reporting is a pretty good example) and data will be streamed from the key-value stores to the relational databases where these activities take place.
All in all, a pretty good division of tasks: applications with a (hopefully) cleaner model relying on key-value stores streaming data to relational databases for reporting.

All the above doesn’t mean that relational databases will disappear, the vast majority of applications do not require to process the massive amounts of data which render your typical database unpractical. Relational databases will be around for quite a while.

* Obviously, there are database clusters (such as Oracle RACs), but they are pretty costly.

** Examples of key-value stores are Google’s Big Table, Amazon’s Dynamo and IBM’s eXtreme scale.

*** Actually, there are constraints attached to it. In order to get the best performance you need to model your application so that data access occurs within the same node in order to avoid a distributed transaction spanning across multiple nodes. Please read this article by Billy Newport from IBM for a better understanding.

No Comments | Tags: Development, Management

27 May 2009 - 14:23DSLs and APIs

In a previous post I was saying that some technological issues are simply proxies for organizational issues and that whether one techonology will thrive or fail within an organization is not determined exclusively on the technology itself but it could be influenced by the organization itself. I ended the post saying:
For example, you may find out that DSLs will not work in particular setting not because DSLs are bad, but because the organization within which these DSLs are implemented and used simply doesn’t warrant their success.

I will get into an example where using DSLs is probably not a good fit because of organizational issues rather than inner strengths and weaknesses. Let’s say that you have some distributed teams working on the same application and that these distributed teams are led by a team of architects which need to create an environment in which these distributed teams work together. I find it pretty hard to see how this environment could be constructed using DSLs, particularly iteratively-developed DSLs, because changes in the DSL would have to be propagated across multiple teams in dispersed locations which would need to be re-trained for every iteration. Compare this approach with creating an API that would be released every so often and for which re-training costs are very small.

I have the impression that the main cost of DSLs is the distance between the place where the DSL is produced and the place where the DSL is consumed and that DSLs tend to be either specific to the location where they are produced or widely known and used within a community (in the second case the distance between the DSL producer and the DSL consumer is compensated by “mind-share”). I would say that the main difference between a DSL and an API is that DSLs are more tightly bound to the domain (by definition) and that part of the costs of using a DSL is the cost of transferring the knowledge about that domain between various stake-holders. This would mean that domain experts can adapt to a DSL targetting their domain faster than a typical developer that is new to the domain and that domain experts would be more productive. The difference in productivity between various members of the work-force means that the work-force composition should partly drive the decision to choose a DSL or APIs on a particular project. Regarding the distance between DSL production and DSL consumption this paper discussing how distance affects knowledge transfers is probably a good read.

Other issues wih DSLs such as developer lock-in (a developer working with a DSL can become out of synch with the rest of the IT landscape), the lack of language design skills on the IT job market, the problem of capturing domain knowledge correctly in large or rapidly changing domains are not explored in this post.

I end this post hurriedly…

Later Edit: I was watching this interview with Charles Simonyi and I would say I remained with these impressions:
If the DSL comes from an already existing language, such as domain experts notations, then there are chances that the DSL will be a good language (this addresses the issue of lack of language design skills on the IT job market). The problem of creating a new language is done away with by re-using a currently used language (the domain experts notations) and creating a new way to implement this existing language.
Regarding domains which are changing rapidly Charles Simonyi says that by separating the notation from the actual domain schema it becomes pretty easy to change the notation (the DSL) without affecting the underlying domain schema. Creating a DSL becomes an iterative process.
End-user programming is to remain a pipe-dream because of the difference in skills between what is needed for programming and the difference in skills needed for working within the domain. DSLs seem to engage the domain expert more into the process of software creation rather than turn domain experts into programmers. Domain experts contributions to software creation thru DSLs may be simply to communicate the domain better to a programmer. Language work benches would create a new division of tasks according to skills.

LLE: I watched this presentation by Intentional Software on domain work-benches and I remained with these impressions:
In an environment where domain workbenches are used it looks like the programmer will be involved in creating projections to an operational environment. The developer will focus on the systems while the  domain experts will focus on the business logic. This division of tasks addresses the issues arising from developers’ lack of enthusiasm about the domain.
It looks like domain work-benches should be used in large and rapidly changing domains where the transfer of knowledge from domain experts to developers is very costly due to the size of the domain and the speed at which it moves (this answers one of my questions above).

I think that the argument that I was making above that DSLs will be location-specific still applies even with a domain workbench in the picture because knowledge transfers with high costs will not appear in the creation of the DSL itself (the DSL being just one notation for an underlying domain schema, one notation among many, it can be created and used locally) but rather in the creation of the domain schema itself (which may end up being a collaborative effort carried out among people in various locations).

As I was saying above I have the impression that in the near future domain workbenches and DSLs will be used where the costs of knowledge transfers from domain experts to developers is large. If the costs of creating languages start to drop and these domain work-benches become ubiquitous then it is possible that a complete separation between domain logic (stored in domain code and expressed thru various notations) and its implementation in a system (which appears to be essentially a projection of the domain code into an operational environment) will appear. It would be interesting to work with Intentional Software’s domain workbench too.

We will have to wait and see…

L3E: This is another presentation on the Domain Workbench.

No Comments | Tags: Favorites, Management