17 December 2009 - 17:10Replication of data and replication of functionality

From my experience I see that replication of data usually helps deal with performance problems, but that sometimes this replication involves replication of functionality, sometimes you end-up replicating some data on a different system (let’s call this the client system) and then you find yourself needing to code on the client system some piece of business logic which resides on the system from where data comes originally from (let’s call this system the master system).

More often than not this is a pretty scenario, because every modification to the business logic done on the master system has to be re-coded on the client system. Sometimes you find that you need to coordinate the releases of these 2 separate systems in order make the whole picture functionally consistent. One way to avoid re-coding the business logic on the client system is to expose the business logic on the master system thru some remoting mechanism (EJBs, Hessian, etc…) and then have the client system pass the data that it replicated locally to the master system in a synchronous call. While this solves the fact having the same business logic reside in 2 places it makes the client system dependent on the master system (*).

One solution to the above conundrum that is to replicate the data, along with the results of carrying out that piece of business logic on that data, to the client system. With this arrangement the client system does not need to have the master system up and running and it also does not need to re-code the same business logic inside. If you manage to use this (this approach doesn’t work in all cases) you will find yourself exporting business logic asynchronously by exporting both data and the results of carrying out that business logic on the data.

If I don’t write another post till January I wish you a Happy New Year!!!!!

* There is a physical dependency (the client systems needs the master system up and running for it to service calls) and possibly a library depedency which may couple the release cycles of these 2 systems (if the client system communicates with the master system thru objects rather than thry wire protocols you will need to synchronize the release cycles of these 2 systems in order to make sure that the client system doesn’t run in PROD with obsolete libraries).

No Comments | Tags: Management

27 October 2009 - 13:13Non-blocking flows

Recently I was working on a business flow to which we had to add a new requirement: grouping a particular type of transactions under a file. The file had to be unique per day, it had to be created on the fly when the transaction batch starts getting processed and the transactions had to be assigned to it at the end of processing. The first solution that one could think of is to change the flow to check if the file exists (and if no then we would create it) and after this check we would assign the transactions to that trade file.

However, doing only this would pose a concurrency problem, namely that two or more transactions batches arrive at the same time when no trade file has been created yet. If each transaction batch would check if the trade file exists concurrently and try to create it, again concurrently, we could end up with duplicate trade files. One way to avoid duplicate trade files is to detect if a trade file needs to get created, allow one of the transaction batches to create the file while blocking the other transaction batches till the trade file gets created. We looked at the costs of blocking and as the costs looked pretty small (we would be blocking only once time per day when the file gets created) we decided to go ahead with blocking.

However, this approach clearly doesn’t scale, and we implemented it because it the conditions for blocking happen very rarely (as I was saying once per day) and it would not be feasible in the case of a higher amount of contention. We looked at some non-blocking alternatives and it looks like a good one would be to allow the transactions to check if trade file exists and if not then to create the trade file on the fly (without blocking) and at the end of transaction processing send further a message saying that there is a risk that some data is inconsistent (namely that there is the risk that some files have duplicates and transactions are assigned to duplicate files) and establish a procedure for repairing the transactions (if necessary). This would allow for non-blocking flows and higher thru-put, but it would come at the expense of a period of time in which data is inconsistent (in our case there is the risk that some transactions will be assigned to duplicate trade files till the duplicate trade files get fixed).

If inconsistent data is OK for the business and the rest of the application (it could be that these repair procedures as well as inconsistent data affect other parts of the application) and if blocking flows are creating significant performance problems then allowing for data to be inconsistent for a certain period of time while providing a mechanism for detection and repair of inconsistencies would probably solve the problem.

Another solution to this problem would be to detect messages which may cause blocking and create a new stage in the flow which deals with such messages.

No Comments | Tags: Development, Favorites

26 September 2009 - 11:34When knowledge is a liability and not an asset

Typically knowledge is considered to be an asset, some going as far as saying that knowledge is power. Well, when you design interactions between systems is better to avoid systems know to much about each other. Having deep knowledge about a system with which you are integrating is both a moral hazard (it gives you the opportunity to hack your way out of problems) and creates high transaction/interaction costs (typically the costs of transferring this deep knowledge to someone else).
The same reasoning goes for interaction between modules in a single system, please check out this article

P.S. I was shocked to see the date on that article: 1971. The fact that we are still struggling with these problems shows a pretty big problem with computer science. Most people equate computer science with exotic algorithms, it would be about time to add to this equation software modelling. It is pathetic when useful concepts for writing software developed 30 years ago are still unknown to the typical developer.
Software development has a pretty big educational problem (most developers educate themselves on their own and learn on the job), it would not be a bad idea to address it.

No Comments | Tags: Management

13 August 2009 - 19:52VMWare acquires SpringSource

Decidedly, this is the acquisition season. After Oracle buying BEA it is now the time for VMWare to acquire SpringSource. I wonder what VMWare gets from this acquisition and different from a partnership with SpringSource and the only thing that I can think of is the exclusivity of selling certain types of higher-value virtual machines.
To make this more clear: VMWare probably has a cloud offering, similar to Amazon Elastic Compute Cloud offering. EC2 has started by selling bare-bones images and moved onto selling higher-value images, also known as Amazon Machine Images (the idea is that you could sell/make available your software - application server, database, content management server, etc… - as an Amazon image and charge per usage for the use of the original Amazon image + the use of your application). It is possible that VMWare will move onto selling higher-value virtual machines thru its cloud offering as well. Now that VMWare has acquired SpringSource it is very likely that they will start selling enterprise virtual machines equiped with all the packages present in SpringSource’s portfolio (which also includes some higher-value packages such as Spring Batch, Spring Integration, etc…). As SpringSource is now under VMWare I would not be very surprised if VMWare will get the exclusivity of providing enterprise, Spring-enabled, higher-value virtual machines (apparently this is not the case, check below). Not sure if it is feasible, but VMWare could even adapt Spring’s higher-value packages into specialized virtual machines (for example , a virtual machine running Spring Batch fine-tuned for batching needs).

If this happens it could happen that VMWare will become the premier hosting service for cloud-deployed enterprise Java applications and for specialized, Java-powered, cloud-deployed enterprise services. If the cloud is the future of application hosting (frankly, I would not bet the farm on it) then VMWare will be the hub where Java applications will be hosted (the difference between hosting a Spring application with VMWare and with a different cloud provider would be the difference between a virtual machine specialized for running Spring applications and an all-purpose virtual machine. This difference in the efficiency of hosting a Spring application would drive more users to VMWare’s clouds).

Still, 420 million dollars is a pretty big sum of money, especially in the current economic climate, and we will have to see if VMWare and SpringSource will find the synergies which will make them pairing together more than the sum of their parts.

Congratulations to Rod Johnson, Adrian Colyer, Juergen Hoeller and the rest of the Spring team and all the best to VMWare and SpringSource!!

Later edit: It looks like you can buy Spring-images on Amazon’s Cloud thru a company called Cloud Foundry which was acquired by SpringSource. I am still trying to figure out why VMWare bought Spring Source rather than entering a licensing agreement with them. I guess we have to wait and see.

Later edit: According to Adrian Colyer this is the future of how Java applications will be deployed. Still, I don’t buy into the cloud hype, I think that for the near future cloud adoption will be determined mostly by prices for different pieces of hardware. Maybe at one point there will be a difference in the efficiency of managing applications deployed in a cloud versus the ones deployed outside which would compensate for the differences in price of hosting an instance locally or on the cloud, but I don’t think we are near this point.

No Comments | Tags: Miscellaneous

9 August 2009 - 2:11Copyright as a fading concept

I have been reading lately A History of Economics: The Past as the Present by John Kenneth Galbraith and one of its main ideas, namely that economic concepts are the product of the times and societies they are born rather than the other way around, made me look at the concept of copyright from a different perspective.

The copyright concept is tied to the ability to make copies: when this ability did not exist (such as prior to the phonograph) or was severely impaired (such as during the vinyl era) the concept of copyright did not exist because there was no need for it. Copyright appeared when music recording moved to a medium which allowed for music to be copied easily enough for music buyers to make their own copies (roughly around the introduction of the compact cassette), hence the introduction of the right to copy. In the current environment characterized by very low copy costs and very inefficient means of enforcing rights to owning a copy you could say that the concept of copyright is living its last moments.

Copyright may appear to future historians as a concept which lasted for a period of time during which the ability to copy content was both developed enough to worry content creators and distributors about people getting content without paying for it and at the same time it was under-developed enough to allow the content creators and distributors to develop mechanisms for controlling content copies. Once the ability to copy and share content became  utiquitous the right to a copy of content ceased to exist as the means of enforcing this right started to weaken.

The question that is posed quite frequently is what will happen to musicians since they will not be able to profit from copyright due anymore? I would say that you could compare their fate to the fate of musicians before the concept of copyright started to exist; which means looking at music before the introduction of the compact cassette ( whose low copying costs allowed for easier manufacture of copies, which in turn allowed for greater diversity in music): musicians would either attain the status of Enrico Caruso and make their living by selling music copies or would rely on gigs building a more or less devoted following. The main difference between a musician relying on gigs in 1910 and 2010 would be that the musician living in the 2010’s would act at a global scale (in terms of distributing and marketing his music, of setting up a tour, of building a fanbase, etc…) while the musician living in the 1910’s would probably have acted at a local scale.

No Comments | Tags: Miscellaneous

29 July 2009 - 14:42Old and new media

For the past 2 weeks I have been staying at home without a computer and without cable TV. I took this opportunity to try to see how my grand-parents would get their news and read newspapers all thru-out this period. When I went back to work and back to a computer I saw that I have not missed much, the newspapers covered all the major events of this small period: ethnic unrest in China, a continous drop in equity markets and the subsequent re-bound, etc… The number of news that got to me thru this channel was a lot smaller than the usual number of news that I get, but I found that the coverage was way better and the subjects were treated in depth.

I think that the main difference between the old media and the new media is that the old media is delivered thru a vastly more narrow channel than the new media, and that the width of this delivery channel pretty much dictates the format and the news that reach you.

Your typical daily is a newspaper with a limited number of pages. The number of pages is both large enough so that it allows the daily to cover more sections, but at the same time small enough to keep the number of sections in check. A typical daily ends up having Local, National, International, Business and Sports sections.
The limited number of pages means that the stories are competing with each other in order to get printed. The result is that a daily covers only the issues which are the most important to its readership. Some print dailies try to compensate for the limited amount of articles that they make available by the quality of these articles (this partly supports the case of print media moving into the luxury goods category because of the scarcity of items its format can deliver when compared with new media).
The limited number of pages also means that there is competition between the journalists wanting to  get published, and this further enhances the quality of the news paper, typically print media which has a large supply  of journalists (New York Times, Washington Post, Los Angeles Times, etc…) tends to print articles of very good quality as only the best journalists get published.

New media changes all this: the very wide channel along which content gets pushed to readers increases the diversity of the content and while it dimishes its quality. The only constraint that I see on content consumption in new media is not the scarcity of supplied content, but rather the opportunity cost which comes with consuming a particular piece of content: when you are consuming some content you are forgoing consuming some other content.
I think that opportunity cost will define the way content gets consumed in the new media and that the organizations operating in the new media will have to dedicate a bigger amount of resources to marketing themselves because the competition between new media organizations will be far fiercer as the typical boundaries between content sources disappear (*).

* One example of a boundary between content sources which disappears and generates competition is spatial distribution of news papers: New York Times and Los Angeles Times which were not competing with one another previously (each being distributed in different cities they were competing only with local newspapers)  are now competing with each other when they are both distributed via the web.

No Comments | Tags: Miscellaneous

2 July 2009 - 16:47Key-value stores and relational databases

Relational databases are going thru a rough patch right now, some pundits going as far as writing them off. Their main problem is the fact that they are constrained to run within the same physical box (*) and scaling out is pretty hard. Once the datasets reach a certain size you will probably need to look beyond the typical relational database at other ways to store your data.

One type of alternative datastore that is getting a lot of attention is the distributed key-value store which maps values to keys and then assign keys onto multiple storage nods according to a hash function (**). In such a setting you get an object thru a key, you work with it and then you save it back in the datastore. The fact that this configuration scales out easily (***) makes this datastore very appealing, and if you are working with large datasets you will probably have no choice but to use something like it.

The transition from relational databases to key-value stores will probably include taking relations out of your application, re-modelling your application in order to group data together and streaming data to a reporting database. Relational databases gave the user both data storage as well as relations between entities which came as both a blessing and a curse (while relations made reporting easier they also gave un-checked access to data which allowed all sorts of corners to be cut). Well, key-value stores will take relations out: data access is more local, you typically have access only to what is immediate to a particular entity and you cannot perform queries spanning your entire data model. This type of data access could turn out to be actually a good thing, because these very close relationships will force an application to have a cleaner design since you will not be able to rely on monster SQL statements for compensating for design deficiencies.

In an application which needs to handle massive amounts of data relations between data will be delegated to activities which require them (reporting is a pretty good example) and data will be streamed from the key-value stores to the relational databases where these activities take place.
All in all, a pretty good division of tasks: applications with a (hopefully) cleaner model relying on key-value stores streaming data to relational databases for reporting.

All the above doesn’t mean that relational databases will disappear, the vast majority of applications do not require to process the massive amounts of data which render your typical database unpractical. Relational databases will be around for quite a while.

* Obviously, there are database clusters (such as Oracle RACs), but they are pretty costly.

** Examples of key-value stores are Google’s Big Table, Amazon’s Dynamo and IBM’s eXtreme scale.

*** Actually, there are constraints attached to it. In order to get the best performance you need to model your application so that data access occurs within the same node in order to avoid a distributed transaction spanning across multiple nodes. Please read this article by Billy Newport from IBM for a better understanding.

No Comments | Tags: Development, Management

30 June 2009 - 21:10Not all information wants to be free

I was reading Malcom Gladwell review of Free by Chris Anderson and I was left with the impression that both of them got pretty wrong the issue of information distribution in the current era of low prices for computing power.
I think that Malcom Gladwell makes a mistake when he compares the information required for creating a bio-tech medecine (Myozyme) with the information found in a newspaper: one big difference between them being the penalties associated with copyright infrigement: huge in the case of Myozyme and small (and disappearing) in the case of the newspaper. Another difference between the information necessary for creating a medecine and the information stored in a newspaper is the difference between the costs of producing it and the related difference in supply: high costs in the case of a bio-tech company and low in the case of newspaper.

I think that Chris Anderson also makes a mistake when believing that the main costs which define how information is consumed are the costs of distributing information, distributional costs are just one part of the story, one other important cost would be the cost generating the content. Given that in the current period the distributional costs are very low, it would follow that the costs of generating the content would be the bulk of the costs of consuming information (i.e. the cost of consuming information would be identical or very close to the costs of creating the content). Basically what happens right now is that the supply for content is dwarfing the demand for it, we are simply swamped in content. Well, this mismatch between supply and demand drives down the revenues of content sources. In addition to this,  one side-effect of low distribution costs is that various content sources are put in pure and un-distorted competition and it is this competition that is driving even lower the revenues of various content providers.
It appears that the content sources that are not experiencing reduced revenues are the ones for which the content supply is very small when compared to demand (and which creates the scarcity which could allow a content source to charge for usage) and which are not experiencing heavy competition (some financial newspapers being a very good example). If anything these content sources have found in the web a new channel thru which to distributed their small supply of content. I would say that content sources which exhibit both a small supply (small enough to create scarcity), a reasonable demand and not a lot of competition will be the content sources which will be economically viable in the coming years (*).

The near future will be very challenging for the traditional newspapers, because the tremendous amount of excess capacity in this space will have to be reduced thru various means (**). Once reduced, the remaining capacity will probably be small enough to match the demand for the content it produces, and at that point various revenue models will appear. And charging for usage will be part of those revenue models, regardless of what various luminaries think about it.

To sum this post up I would say that the newspapers are in a pretty currently bad corner: the new low distribution costs have put them in fierce competition which resulted in massive excess capacity (***). The excess capacity will be dealt with one way or another and at the end of this process the remaining newspapers will come back to health.

Later edit: If you have read enough material on the changes currently affecting newspapers it is very likely that you have come across the concept of “economics of abundance”, a term used in science-fiction before the Web2.0 “revolution”. In the context of current changes in media “economics of abundance” is just another synonim of over-capacity.

* Incidentally, niche content sources fit this description pretty well.

** The excess capacity could be reduced by differentiation. One example would be New York Times rising the price for its street paper, which some argue that it pushes NYT into the category of luxury goods, differentiating it from the other news papers. Specializing in a new field is another way to direct excess capacity towards more productive uses.

*** The increased competition between newspapers comes from the fact that while previously the competition was mostly local between local newspapers because content distribution was mostly local in the current period the distribution is global, therefore more newspapers, from any conceivable place, are now competing for the same pairs of eyeballs. Right now the same news item can reach you thru a very large number of channels, while previously it could reach you only thru a few.
I don’t include the dozens of millions or so of blogs into the excess capacity mentioned above because blogs are not in the same league as newspapers, the differences in quality between a typical blog and a newspaper separate these 2 types of content.

No Comments | Tags: Miscellaneous

2 June 2009 - 21:20Message translation in GMail

You probably have read about how GMail added a new feature that lets you read messages written in a different language, language that you may not understand, by having them translated into your language. This is a pretty interesting feature that will not get much adoption outside Google labs, the main reason for this being not the quality of the translation (not great) but rather the fact most people need to initiate some sort of contact in order to start exchanging emails.

All this aside, it is a pretty interesting proposition from Google and leaves you wondering about how would a group of persons that communicate with one another by using the translator package just added to GMail outside GMail, let’s say in a face-to-face environment (let’s say a group of experts from various places , all speaking a different language attending the same conference). Would they be emailing each other while seated at the same table, given that they do not speak a common language (*)?
A pretty weird feeling is setting in: you may end up having relationships with people only thru this virtual medium which negates all other forms of communication. The language barrier will be replaced by an expressiveness barrier: many things will not be able to be expressed thru this medium and will be left-out.

* BTW, the message used in the TechCrunch example is a very good example of a message that would never get translated because it would never be written in a different language: if you want to invite a hiking buddy out for a walk you would probably want to be able to communicate with him/her in a commonlanguage, not thru the GMail translator… I wonder what a walk thru a park is like when you need to drag along GMail (preferably deployed on your iPhone) in order to talk to someone. Surreal…

No Comments | Tags: Miscellaneous

27 May 2009 - 19:11Laptops vs mobile phones

I was reading this article on the Experimentia blog and I have to agree with its conclusion: internet applications are turning computing devices from platforms where applications are run into interfaces into various applications deployed remotely. As applications are moved off to the internet it follows that they are accessible to a larger audience as the cost of running those applications shifts from hosting the applications locally (as with desktop applications) to connecting to the applications.

There is one big difference between ordinary desktops and mobile phones, though, the ergonomics. As the feature set of an application increases so increase the demand for a larger interface in order to interact with that richer feature set (which could be deployed remotely, BTW). This would mean that third world consumers will first interact with an application thru a cheap mobile phone, and access a reduced feature set, and as they grow richer they could afford a tool which would allow them to access the same application thru a different, more expensive and graphically more expressive interface (the same application accessed thru a netbook).

The difference in income would drive the difference in access to the very same application. This could allow for identifying customers with higher incomes by the way they access the same free resources…

No Comments | Tags: Miscellaneous