31 July 2010 - 16:11Domain driven development and agile methods

Watching Eric Evans explaining how to incorporate agile methods in domain driven design gives you an understanding of some of the differences and commonalities between domain driven development and agile methods.

Agile methods grew out of the fact that up-front design which involved months of analysis was getting out of synch with the world which required faster development cycles and the need to adapt to a world changing at high speed (*). However, the responsiveness of agile methods and their focus on the next iteration limited development the scope at which development is carried out and eliminated the design phase out of the development process. The result were applications for which development slowed, if not even stopped, once the application reached a certain complexity.

Eric argues that in order to fix this problem we need to bring domain driven design back into the picture, but different from the top-down, process-heavy manner in which it was conducted before. The main reason modelling an application iteratively is that the development of an application is a learning process, a process of discovering and annotating a particular domain.

Some agile techniques such as the ability of reacting swiftly to changes and to be able to perform significant changes late in the game are presented, but they seem to me to be liabilities rather than assets. Performing a significant change late in the game is a very expensive operation since it is essentially a re-organization of the concepts encapsulated in the model followed by the confusion generated by the dissemination of the new model; such things should be avoided at all costs. Rapid changes in the model are also bad since they can create confusion.

One agile practice which I think benefits domain driven design is the break-down of the design process in iterations during which new domain concepts are inserted into the model. It is very important that these iterations also make sure that the model stays in synch with the domain in order to avoid late, massive and expensive domain re-factorings. In order to do this Eric outlines a series of diagnosis measures used for detecting when the model is straying away from the model.
While these measures are necessary what is missing in this process is a way for the domain expert to validate the model. The knowledge transfers have no feed-back loop from the domain experts. The developer seems the sole owner of the model, ideally the domain expert could become more engaged in the definition of the model rather than a passive disseminator of knowledge.

Modelling brand new domains or domains in a state of change will probably always be error-prone because the development of an application in a such a domain is a discovery process for all the parties involved (from users to business analysts to developers) whose end-result is defining the domain. In such an environment probably the only way to define and disseminate domain knowledge is to set-up sessions with domain experts. However, knowledge transfers in mature domains should be handled differently in order to leverage the existing knowledge. It is a good idea to get into the domain by first reading up on it and then engaging in knowledge-transfers with domain experts.

Eric Evans covers a lot of ground in his talk from the methodology of transferring domain knowledge to diagnosing misalignments between the model and the domain, solutions to correcting such misalignments and strategies for coordinating teams working on the same project. However, I wish he would have talked a bit about the structure and skill-sets of the teams involved in domain-driven development, of the ways to disseminate domain knowledge into a team and of how to engage the domain experts in reviewing and providing feed back on the domain being developed.
All in all a very good presentation which I encourage people to watch.

* The rejection of the analysis phase by agilists could partially be explained by the fact that the agile movement started at a point where appeared brand new domains (such as e-commerce, e-marketing, content management) which were in a continous state of transformation and for which there was no prior knowledge. Analysis in this case was viewed, to a certain extent correctly, as a process which requires significant investements (in order to overcome the lack of knowledge) without a clear return (the results could be obsolete in a few months due to unforeseen changes).
At the same time the revolution in communications created the opportunity to add on stakeholders previously considered un-related to the domain. Encapsulating requirements in an application from unfoereseen stakeholders meant that the domains became more prone to change and harder to predict.
These factors contributed to the misconception that analysis hinders rather than helps.

No Comments | Tags: Development, Management

29 March 2010 - 13:31Replaying requests in flows

If you followed various presentations on Event Driven Architecture for a while you must be familiar with one advantage that many people talk about without going into detail: the ability to recover from crashes simply by re-playing events that were sent to your system. Most presentations give the impression that a flow-based system based on passing messages is born with this ability, but the reality is that it must be designed in order to implement such a functionality.

When designing such a system you first have to ask yourself if you need this ability and I would say that the answer in quite a few cases is yes. The most basic recovery from crashes for a flow-based system consists of the message broker booting up, determining what messages have to be sent and re-sending the messages to the message consumers. Chances are that re-sending the exact messages that caused the crash will cause another crash and in order to avoid this you should be able to wipe out the message store, determine what events need to be re-played and replay them in an orderly fashion till your system goes back to its normal state.

Next, you should determine how to design such a system. One way to design it would be to code the stages as idempotent operations, that is, operations which when carried out multiple times give the same results. However, sometimes the stages of the model of the system are not easily captured in idempotent operations and sometimes it is downright impossible (*).
Another way to design it would be to break the flow into stages and the stages into 2 categories: idempotent stages and non-idempotent stages (**). Next, record the requests that come in and record each stage that a request has completed successfully. For non-idempotent stages also record the state of the request after their completion. Replaying requests in such a system consists of determining what requests are in non-idempotent stages and replaying them from these stages. For example, let’s say that you have a system that accepts orders, performs matchings on them (matches buys to sells), creates fills out of these matches and sends the fills out to an outside application. This system has 4 stages: order receival, order matching, fill creating and fill forwarding. Let’s say that order receival is a non-idempotent operation, order matching is an idempotent operation, fill creation is a non-idempotent operation and fill forwarding is another non-idempotent operation. In order to design such a system for re-play you will need to track a request across all the stages, determine what requests are in non-idempotent stages and in the case you need to replay the requests to replay them from the non-idempotent stages.

Replaying requests could also help you releasing a new version of the system in which the classes of objects which are sent from one stage to another change. Typically when such a release is carried out any message in transit cannot be processed anymore because of class versioning exceptions, adding the option of replaying the requests after updating the flow with the latest release would help solve this problem.

* One example of an operation which may not be able to be made idempotent is sending messages to an external party. For this operation to be idempotent it would be necessary for the external party to be idempotent (that is, it would mean that the same message sent multiple times to the external party would have the same effect). This assumption sometimes turns out to be invalid.

** One example of a idempotent stage is a stage that performs some transformation/computation on the messages it receives and that forwards the messages to another stage. One example of a non-idempotent stage is a stage that persists data to a datastore or sends messages out to a non-idempotent external application.

No Comments | Tags: Development, Management

27 October 2009 - 13:13Non-blocking flows

Recently I was working on a business flow to which we had to add a new requirement: grouping a particular type of transactions under a file. The file had to be unique per day, it had to be created on the fly when the transaction batch starts getting processed and the transactions had to be assigned to it at the end of processing. The first solution that one could think of is to change the flow to check if the file exists (and if no then we would create it) and after this check we would assign the transactions to that trade file.

However, doing only this would pose a concurrency problem, namely that two or more transactions batches arrive at the same time when no trade file has been created yet. If each transaction batch would check if the trade file exists concurrently and try to create it, again concurrently, we could end up with duplicate trade files. One way to avoid duplicate trade files is to detect if a trade file needs to get created, allow one of the transaction batches to create the file while blocking the other transaction batches till the trade file gets created. We looked at the costs of blocking and as the costs looked pretty small (we would be blocking only once time per day when the file gets created) we decided to go ahead with blocking.

However, this approach clearly doesn’t scale, and we implemented it because it the conditions for blocking happen very rarely (as I was saying once per day) and it would not be feasible in the case of a higher amount of contention. We looked at some non-blocking alternatives and it looks like a good one would be to allow the transactions to check if trade file exists and if not then to create the trade file on the fly (without blocking) and at the end of transaction processing send further a message saying that there is a risk that some data is inconsistent (namely that there is the risk that some files have duplicates and transactions are assigned to duplicate files) and establish a procedure for repairing the transactions (if necessary). This would allow for non-blocking flows and higher thru-put, but it would come at the expense of a period of time in which data is inconsistent (in our case there is the risk that some transactions will be assigned to duplicate trade files till the duplicate trade files get fixed).

If inconsistent data is OK for the business and the rest of the application (it could be that these repair procedures as well as inconsistent data affect other parts of the application) and if blocking flows are creating significant performance problems then allowing for data to be inconsistent for a certain period of time while providing a mechanism for detection and repair of inconsistencies would probably solve the problem.

Another solution to this problem would be to detect messages which may cause blocking and create a new stage in the flow which deals with such messages.

No Comments | Tags: Development, Favorites

2 July 2009 - 16:47Key-value stores and relational databases

Relational databases are going thru a rough patch right now, some pundits going as far as writing them off. Their main problem is the fact that they are constrained to run within the same physical box (*) and scaling out is pretty hard. Once the datasets reach a certain size you will probably need to look beyond the typical relational database at other ways to store your data.

One type of alternative datastore that is getting a lot of attention is the distributed key-value store which maps values to keys and then assign keys onto multiple storage nods according to a hash function (**). In such a setting you get an object thru a key, you work with it and then you save it back in the datastore. The fact that this configuration scales out easily (***) makes this datastore very appealing, and if you are working with large datasets you will probably have no choice but to use something like it.

The transition from relational databases to key-value stores will probably include taking relations out of your application, re-modelling your application in order to group data together and streaming data to a reporting database. Relational databases gave the user both data storage as well as relations between entities which came as both a blessing and a curse (while relations made reporting easier they also gave un-checked access to data which allowed all sorts of corners to be cut). Well, key-value stores will take relations out: data access is more local, you typically have access only to what is immediate to a particular entity and you cannot perform queries spanning your entire data model. This type of data access could turn out to be actually a good thing, because these very close relationships will force an application to have a cleaner design since you will not be able to rely on monster SQL statements for compensating for design deficiencies.

In an application which needs to handle massive amounts of data relations between data will be delegated to activities which require them (reporting is a pretty good example) and data will be streamed from the key-value stores to the relational databases where these activities take place.
All in all, a pretty good division of tasks: applications with a (hopefully) cleaner model relying on key-value stores streaming data to relational databases for reporting.

All the above doesn’t mean that relational databases will disappear, the vast majority of applications do not require to process the massive amounts of data which render your typical database unpractical. Relational databases will be around for quite a while.

* Obviously, there are database clusters (such as Oracle RACs), but they are pretty costly.

** Examples of key-value stores are Google’s Big Table, Amazon’s Dynamo and IBM’s eXtreme scale.

*** Actually, there are constraints attached to it. In order to get the best performance you need to model your application so that data access occurs within the same node in order to avoid a distributed transaction spanning across multiple nodes. Please read this article by Billy Newport from IBM for a better understanding.

No Comments | Tags: Development, Management

31 March 2009 - 20:01Assigning responsabilities

I think that this presentation by Rebecca Wirfs Brock on driving the design from responsabilities pretty much hits the nail on the head: responsabilities drive design and good design is a good separation of responsabilities. From what I have seen most of the errors in software development arise either from grey-zones in which multiple responsabilities are implemented and overlap or from responsabilities which spread across multiple components. Both are examples of incorrect mapping of responsabilities to components.

All in all, a very good presentation.

No Comments | Tags: Development

23 September 2008 - 19:34OOP inheritance

Typically OOP inheritance is used for sharing various common methods and a typical example would be something like this:
You have interface Vehicle which has sub-classes Bicycle and MotorVehicleMotorVehicle in turn has the sub-classes Car and Truck, each with its specifics. Car, Truck and Bicycle are all sub-classes of Vehicle and as such are all inheriting the same common behavior, such as some object which specifies the speed limits within which each Vehicle should operate (for example a Car should be able to handle higher speeds than a Truck or a Bicycle).
Fair enough, and this way to use OOP inheritance has its reasons for use.

However, one other reason for using inheritance that I see is for creating substitutes in order to partition code bases. In this case you are using sub-classes not because you need some shared behavior to be encapsulated in a super-class, but because you have distributed teams working on the same codebase and you need to minimize collisions between them in the codebase (a collision would be 2 or more persons working on the same file and is usually resolved by a merge in CVS). Solving conflicts thru CVS merge is a pretty error-prone process and doesn`t scale out well, so at one point it is necessary to partition development so that different teams can work on the same project without elbowing each other.
Sub-classing in this case would solve this problem by letting a developer substitute its team’s type for another team’s type.

One example is using helper classes which service a POJO and which are used by more than one team:
Let’s say that we have a TaxProcessor class that uses a TaxCalculator for calculating the taxes to be applied to a trade. Let’s say that you have 2 teams, one working on taxing bond trades and one working on taxing equity trades and that they are all using the same TaxCalculator class on the TaxProcessorTaxCalculator has the method taxTrade which delegates to taxBondTrade and taxEquityTrade. As time goes on a lot of fixes are put into taxTrade in order to deal with various corner-cases to the point where 1) TaxCalculator becomes hard to maintain and 2) more than one developer is working on the same method at the same time. A solution would be to sub-class TaxCalculator into BondTaxCalculator and EquityTaxCalculator, have each implement the method taxTrade according to its specifics and then have TaxProcessor use the appropriate class according to the trade type.

As you can see from above good software models imply good separation of concerns. Good separation of concerns translates in good separation between development teams working in parallel on the same project. Good separation between development teams results in low transaction costs or interaction costs between teams and into a more efficient way to delegate work and in a development environment which scales better.

No Comments | Tags: Development, Management

28 August 2008 - 13:13Technical debt

You are probably familiar with Martin Fowler’s Technical Debt, a metaphor around the idea that doing things the quick-and-dirty way creates bad code, bad code which can be viewed as a debt which needs to be paid later in installments (the principal is the re-factoring of the bad code and the installments are the extra effort that this bad code forces on development). The concept was originally set-up by Ward Cunninghan.

Technical Debt is pretty much different from ordinary, everyday debt primarily because there is no creditor and no maturity date. Also, unlike regular debt, it is very hard to quantify. Technical Debt seems a bastardized version of regular debt in the sense that some development costs are identified as the principal (the re-factoring of the bad code) and some as the interest (the hacks used for dealing with this bad code) and it is built by taking some concepts out of the ordinary debt concept as the author sees appropriate in order to underline some development costs.
At the same time Tech Debt is defined exclusively from the point of view of the borrower who has to worry about paying the principal and the interest leaving out other actors (even though once you take the creditor out of the picture you have to wonder why you have to pay this debt at all ;-)). But, like Martin Fowler said at the beginning of his article, this is a metaphor, which leaves a lot of room for stretching various concepts. Fair enough.

However, Technical Debt seems to have been taken up and spawned quite a few siblings as you can see in this infoq article: liquid assets, moral hazard, fertile assets (???) etc…, which are either unfortunate enough to bear little or no resemblance to the original economic concepts they try to refer to or unfortunate enough to not refer to anything (fertile assets stands out as a prime example).
Let’s pick for example “Liquid Asset” which is defined this way: Perhaps the term “technical debt” focuses us on the wrong things; maybe focusing on the converse, on the investment side of things might be more effective.
First of all, a real-world definition: a liquid asset is an asset for which transaction costs are lower and for which there exists a market in which this asset can be sold and bought for a reasonable fee. Try to look into the above definition (or in this link which is provided next to it in the infoq example) and if you can spot the asset that is referred to, as well as the market in which this asset is sold and bought and the fees for these transactions please let me know.
Second, switching terms in order to motivate people displays how shallow these concepts are to begin with: if Technical Debt would be a concept that relates to somethings tangible in the real world, if it would manage to encapsulate a real problem or some real costs then Tech Debt should should not be swept under the rug for fear of under-mining the developers’ morale, but rather dealt with up-front because in doing so you would solve a real problem. The fact that Tech Debt is used the way this way points to the fact that it is so vaguely defined that few people can extract some somthing valuable out of it and that it can be stretched in any direction you want to.

What I see when I am reading the infoq article mentioned above are the first steps in the attempt to “marry” 2 fields which are pretty different one from another: economics and writing code and the excesses which appear when interest start to pick up in some vague concept (excesses similar to the introduction of the SOA concept). There is a desire to bring IT development under control and one way to look at it is to define costs and benefits associated with various actions and then apply economics to these costs and benefits in order to write code more efficiently. I don’t know where this will lead because we are at the beginning of exploring this.

As far as I am concerned I have developed an interest in economics a while ago and I read on it as much as I can. I also tried to “marry” the domains of writing code with and of economics, I even have an Econo-computing category, but I find it pretty hard. So far the only economic concept I found that can be applied pretty well to software development are transaction costs because they actually encapsulate pretty well some very real costs which arise when various entities are interacting one with another.

I will be following this Econo-computing field with a lot of interest and, who knows, maybe the concepts in this field will actually start to relate to something tangigle in the real world and will improve developer productivity.

No Comments | Tags: Development, Econo-computing

1 August 2008 - 23:12EDA, clustered caches and triggers

A while ago I went to a presentation where the speaker was talking about the migration of an application from a typical MDB-based work-flow application to an EDA-based application. One of the drivers of this migration was the fact that their application`s data has grown so large that the database storing it was having performance problems. One of the solutions which were studied was to move the data tier into a clustered, transactional cache which would update the database asynchronously, essentially moving the database to a simple store of data on which you could run reports.

Not a bad idea, the data would reside in this clustered cache and it would be made available to the application. One other thing that was considered was turning the whole architecture on its head and turn from the work-flow application into an EDA application. This is how it was supposed to come: the clustered cache came along with the capability of adding events to data as data was handled (inserts, updates, deletes, etc…). This would mean that while you previously had to have an MDB that was listening for a stock-quote request you could now code a stock quote object which you would throw in this clustered cache and which could listen for events. The clustered cache would have both flushed the stock quote objects to a database and would have call events on it. The stock quote object would in turn update other objects (let`s say positions for that stock) and while these positions would get updated they would fire more events in turn. A part of the business logic would be stored in these events and in the message passing that occurs between them, in a sort of event-driven-architecture.

The more I think about this, the more I get the impression that these events are nothing more than triggers in a database (the clustered cache taking on the role of the database). And coding your business logic in triggers is not a good thing if you listen to people with experience in database programming. Relationships between various entities and the interactions between them should probably be modeled at a higher level than at object level, when you are coding business logic in triggers you are essentially limited to that object’s scope.

Now, the idea of putting the business logic into these triggers comes from a pretty hard problem: the fact that for most artchitectures out there the data store is physically away from the code that is implementing the business logic, the database which holds the data and which needs to be consulted by the business-logic tier for data. Bridging this physical distance and the performance problems that come with it was solved in various ways: pushing the business logic into the database in the form of stored procedure, addind various caching mechanisms so that data is pushed towards the middle-tier, etc… In the example above part of the reasons for pushing the business logic into these events was you would effectively push the business logic into the new data-tier (the clustered cache in this case which would trigger these events). I would say that this is a pretty interesting approach with some benefits, but attention should be paid to minimizing the number of events (triggers) which get implemented.
A decade of PL-SQL development would recommend this…

No Comments | Tags: Development

10 June 2008 - 12:20Grid design patterns - de-normalized data

I am watching with a great deal of interest developments in the grid and large scale computing environment because I always found distributed computing interesting.

One very interesting thing that I come across pretty often is the fact that in large scale computing data tends to get de-normalized, basically it grows so large that you cannot fit the entities and the relationships between entities in the same database (this infoQ article is pretty good).
Let me give an example: suppose that you have table of users and a table or messages between users having these columns:
users: userID int, firstName String, lastName String, email String
user_messages: fromUserID int, toUserID int, messageTest String.
In the initial design a message from user A to user B would have only one record in user_messages and each user will be able to get the messages it has sent and the messages it has received from this table.
Now let’s say that the number of users sky-rockets to the point where you need to partition your database horizontally into a series of identical replicas. The problem that you will face now is where to store the data that is shared between these replicas, namely the table user_messages. You cannot store it either in the replica hosting user A’s data, neither in the replica hosting user B’s data and neither in its own instance because it will grow too large and because you will need to carry out a join over 2 physically remote databases. The solution is to drop user_messages for received_user_messages (from_user_name, from_user_email, message_text) and sent_user_messages (to_user_name, to_user_email, message_text) which will store the messages that a user has sent and the ones that it had received. Each database replica will have its own instance of these 2 tables each mapped to the user table. As you can see we are not storing the userID in the message tables, but rather the user information which was previously retrieved via joins.
Sending a message from user A to user B in this environment typically involves updating 2 tables in 2 separate database instances which usually is treated as a distributed transaction. However, you cannot carry out this distributed transaction because it is very costly so sometimes you resort to updating one table and sending a message to update the second table asynchronously (the overhead for delivering a JMS message is way less than the overhead for locking rows in the second database). In the case of updating the second table asynchronously you are still enforcing the relationships between the user table and the user message tables, but at a later point in time.
De-normalized data comes with synchronization costs: when you are updating the user information on the users table you will need to propagate the changes to received_user_messages and sent_user_messages so that the data stored in these tables will be up-to-date. This could be done via asynchronous processing as well, sending a message about data being changed on an ESB and having the concerned parties listening for and processing it. Synchronization costs should be watched very carefully because they could spawn a very high number of messages. Ideally we would synchronize data which is updated rarely (such as user information) in order to keep the number of synchronization messages down. Carrying out synchronization procedures in batches could be a way to deal with a large number of synchronization messages with the side effect that it may increase the latency with which the synchronization takes place.

Large scale computing typically seems to resolve around asynchronous processing (because in transactions message passing is cheaper than database access) and de-normalized data across with the overhead implied by it. The relationships previously enforced by normalizing data are usually enforced by passing JMS messages which carry out the data changes asynchronously. This design is driven primarily by the large volume of data which cannot be serviced by only one database and by the costs of carrying out a distributed transaction spanning 2 databases.
This is one design that I see emerging for large scale computing: de-normalized data along with asynchronous processes which are enforcing the relationships between entities via message passing.

It remains to be seen if this type of grid architecture will prevail in the future. Working against it is the emergence of multi-core processors which would allow for scaling up cheaply as envisaged by Brian Goetz in this article. If chips with hundreds of multi-cores and with large amounts of memory become reality scaling out could end-up costing more than scaling up and all the above could become history pretty fast. Scaling out will continue to make sense in some environments with humongous amounts of data (think Amazon, Google, eBay, etc…) but for the current fastest growing segment in grid applications, typically applications processing moderately large amounts of data, it may make sense to scale up once chips with a large number of multi-cores become a reality.

To watch…

P.S. These shifts in data processing (moving to a grid-type of architecture in order to accommodate the increase in data and then back to a single-box architecture because of the appearance of chip with a large number of cores) makes a pretty good case for abstraction in order to lower the costs of transition from one architecture to another. Ideally the architecture and the environment in which an application runs should not affect the business logic of that application. One way to insulate your application’s business logic from these infrastructure issues is by abstraction.

No Comments | Tags: Development, Favorites

17 March 2008 - 17:22What is missing in OS testing tools

I was watching this infoQ presentation by Alexandru Popescu and Cedric Beust on testing when I realized that a big market for testing is seriously neglected.

Regarding my use of testing I would say that JUnit pretty much fills my bill and I don’t see a need to move to TestNG. I was watching the presentation not so much as to know more about TestNG, but to get some exposure to the market for testing products (*). So the features which were presented and which were said to be in a high demand from the users were the ability to define test groups, to test data connectivity and to define dependencies between tests (**). Pretty fair, I would take this functionality to be a logical progression from the ability to run a bunch of tests at random as JUnit lets you do it, you are basically starting to look at the tests you run from a higher level and you start looking for ways to aggregate these tests into higher-level constructs, such as test groups, which can then be manipulated according to various needs.
So far I would say that Test NG is looking at ways to manage the complexity coming out of a high number of tests. Test NG is for test-heavy shops, where testing is considered a concern on par or close to development. Not a bad thing and I pretty sure that TestNG covers some functionality which is in high demand by Java developers.

There is, however, one need for testing that so far it is largely unsatisfied and sorely missed: the need to test workflows or series of events. Let’s say that you have an application that is a series of MDBs, each accepting messages, transforming them and then outputting them to the next MDBs. You would want to be able to test this application end-to-end.
Let’s say that you have an application that receives market trades, needs to process them and then transform them in order to send them downstream to tax systems, settlement systems, etc… You would want to put thru a trade, set-up in a certain way and then trace its execution thru these flows and determine if, in what stage and in what shape has this trade reached the Application-to-Tax-System gateway.
When I think about work-flow testing I usually think about defining a message, inputting this message into a work-flow engine and then defining interceptors to see how the original message has been manipulated at various stages. You may need to test both the work-flow itself (to see if the message has been going thru the work-flow that it needed to go thru), the end-result (to see where the message has been forwarded to and in what shape) as well as how it behaves at various stages in this workflow (if needed). I see work-flow testing primarily concerned with interception of messages (and this would probably be a great use of AOP) and with the possibility to correlate messages passing thru this work-flow with the original message.

OS testing tools so far are limited to synchronous testing. I wonder at what point will the need for work-flow testing become so pressing and the demand for it so great so that one OS shop will start doing something about implementing a testing framework for testing workflows. Then we could use Event-Driven-Testing for testing an application written in a Event-Driven-Architecture manner…

* I know that trying to form an opinion on a market such as the market for OS testing tools from a vendor presentation is a pretty risky business, what you get from vendor presentations is usually distorted because of bias and time constraints but I will assume that this presentation gave a fair image of what users want from testing tools.

** From what I know TestNG is a lot more than some annotations that give you the possibility to form test-groups and test dependencies. However, I would say that these issues are probably considered more important and more aligned with the market for OS testing tools since they were the ones which were included in this presentation.

P.S. If I were to choose one enterprise concern which is not addressed by the Spring stack I would choose work-flow. Spring has modules for integration, batch, transaction management, security management, connectivity to various end-points, etc…, it is missing this capability. From what I see work-flows are used pretty heavily in the enterprise space and for now I think they are mostly implemented either by in-house solutions or by commercial solutions. OS could probably make a contribution in this space as well. And if it does it should probably try to give the developer the ability to test work-flows, or at least make it easy for him/her.

P.P.S. The comments by Alan Keefer on this thread reinforce my beliefs that we need to carry out tests at every level, from high-level to unit test. For a work-flow based system this would mean that we need to test the work-flows themselves and not only the units making them up.

Later edit:  There are actually a few OS workflow tools. OSWorkflow, the ones at the bottom of this article and probably a few others. I should try them at one point.

No Comments | Tags: AOP, Development, Favorites