29 March 2010 - 13:31Replaying requests in flows
If you followed various presentations on Event Driven Architecture for a while you must be familiar with one advantage that many people talk about without going into detail: the ability to recover from crashes simply by re-playing events that were sent to your system. Most presentations give the impression that a flow-based system based on passing messages is born with this ability, but the reality is that it must be designed in order to implement such a functionality.
When designing such a system you first have to ask yourself if you need this ability and I would say that the answer in quite a few cases is yes. The most basic recovery from crashes for a flow-based system consists of the message broker booting up, determining what messages have to be sent and re-sending the messages to the message consumers. Chances are that re-sending the exact messages that caused the crash will cause another crash and in order to avoid this you should be able to wipe out the message store, determine what events need to be re-played and replay them in an orderly fashion till your system goes back to its normal state.
Next, you should determine how to design such a system. One way to design it would be to code the stages as idempotent operations, that is, operations which when carried out multiple times give the same results. However, sometimes the stages of the model of the system are not easily captured in idempotent operations and sometimes it is downright impossible (*).
Another way to design it would be to break the flow into stages and the stages into 2 categories: idempotent stages and non-idempotent stages (**). Next, record the requests that come in and record each stage that a request has completed successfully. For non-idempotent stages also record the state of the request after their completion. Replaying requests in such a system consists of determining what requests are in non-idempotent stages and replaying them from these stages. For example, let’s say that you have a system that accepts orders, performs matchings on them (matches buys to sells), creates fills out of these matches and sends the fills out to an outside application. This system has 4 stages: order receival, order matching, fill creating and fill forwarding. Let’s say that order receival is a non-idempotent operation, order matching is an idempotent operation, fill creation is a non-idempotent operation and fill forwarding is another non-idempotent operation. In order to design such a system for re-play you will need to track a request across all the stages, determine what requests are in non-idempotent stages and in the case you need to replay the requests to replay them from the non-idempotent stages.
Replaying requests could also help you releasing a new version of the system in which the classes of objects which are sent from one stage to another change. Typically when such a release is carried out any message in transit cannot be processed anymore because of class versioning exceptions, adding the option of replaying the requests after updating the flow with the latest release would help solve this problem.
* One example of an operation which may not be able to be made idempotent is sending messages to an external party. For this operation to be idempotent it would be necessary for the external party to be idempotent (that is, it would mean that the same message sent multiple times to the external party would have the same effect). This assumption sometimes turns out to be invalid.
** One example of a idempotent stage is a stage that performs some transformation/computation on the messages it receives and that forwards the messages to another stage. One example of a non-idempotent stage is a stage that persists data to a datastore or sends messages out to a non-idempotent external application.
No Comments | Tags: Development, Management