30 January 2007 - 20:30Tagging

Tagging is pretty much synonimous with Web 2.0 and collaboration. It seems to pop-up all-over the place: you tag movies, posts, your bookmarks on del.icio.us, etc… I think it is pretty popular because it requires minimal input from the user: just give us 2, 3 words about this page and we will take it from here. In today’s fast-paced world you cannot ask a user to provide significant input, it simply doesn’t have the time. You also do not have the resources to process this significant input, because processing it would require human readers, rather than software and these readers cost a lot. Another reason why tagging is popular is that it is also good for categorizing because it is a way of categorizing an item AFTER you have consumed it and not BEFORE, as a result you could argue that it splits items into bins a lot better than setting up rigid categories up-front.

But this flexibility has an ugly reverse: the low signal-to-noise ratio which comes from many people using the same tags for tagging very different items. There are 40,000 words in the English language which you could use for tags and a probably a dozen million web-pages which are tagged in del.icio.us (del.icio.us hit 1 million users sometime ago, I assume each of them has at least 10 tagged items). Do you think that tagging can scale meaningfully at this level? You will try to apply various algorithms in order to differentiate the items being tagged (number of tags, frequency, time, etc…) but at the end of the day you will end up with very large datasets which need to be processed MANUALLY.

Which brings me to the next problem: the huge cost with false-positives: in order to register a false-positive (a page that has been tagged incorrectly and that doesn’t service your needs) you need to consume the information, to read it, and this takes usually a lot of time.
I gave up long time ago to retrieve good information by navigating my del.icio.us tags and trying to find common things with other people, the amount of information that I have to READ (which is a very expensive process) in order to find something worthwhile is ridiculous. I find that it is a lot more effective to keep reading from good sources of information in order to get an understanding of a new field rather than data-mine del.icio.us for my needs. It takes more time, but it is a lot less expensive.

One last problem with tagging is that it is a level playing field: a developer with 2 years of experience ranks on the same level as a someone who worked in, let’s say, transaction management for the last 15 years. If both of them tag different pages with the tag ‘transaction-management’ which one do you think is the page that better reflects the tag? Actually, let me re-phrase that: which page reflects the tag better for who? The pages tagged by the transaction-management specialist will probably be useful for someone working with high-level transactions and working on transaction propagation, while the pages tagged by the junior developer will be useful for another junior developer starting to learn about ACID. You could say that tagging works best in a homogenous environment where everybody expects pretty similar results from the same tag.

Tagging has been with us for a while, but so far it is used for pretty simple associations. I’m still waiting for the “wow” moment…

No Comments | Tags: Miscellaneous

27 January 2007 - 21:14REST and WS - part 2

I wrote a while ago a post about the differences between REST and WS. I’d like to re-visit this issue today.
I like REST conceptually, but I find it too crude. It revolves around 3 or 4 very simple operations which, its proponents claim, scale at the size of the Internet. Well, you can do pretty much everything with these simple operations, it is the value which is added to these simple operations (how you mix them up) that really matters. The packages coming out of these simple operations should scale, not the operations themselves. It is disingenious to claim that you can do everything with these operations, you can do exactly the same thing if you code in binary, but I have not seen many people advocating this for a while (even though they probably were quite a few when the compilers appeared ;-)).

The lack of standardization in REST has some subtle side-effects, one of them being, of course, the fact that you have to work extra hours because in REST you have to code by hand the interaction with a different party. In WS you can look up the webservice, invoke a code generator and immediately generate the connection to that WS, in about 15 seconds you are ready to communicate with the other party. In REST you usually get a spec highliting the URLs which serve as entry points into the other system from which you derive the communication schema and you will go off to implement it. I am sure that quite a few REST developers out there have built their own generators for building the connection with a REST-enabled endpoint, but not everyone is proficient in such tasks. WS seriously beats REST when it comes to ease of use.

I like from REST the fact that the URLs can act as a level of indirection which in turn makes customization a lot easier. Small example: user A and user B use the same URLs from the same REST service, but user A has a platinum account while user B has the bronze account. Well, the application could serve more/better data to user A (more search results for example) thru the same URL by looking up the user’s account and making up all sort of decisions. In WS you cannot do this right now, you do not have the concept of a session yet (even if WS-Conversation is being spec-ed out) and the most implementations map a class or session bean to an end point and not the other way around (you need the endpoint mapped to a process easily and not vice-versa in order for that endpoint to become a level of indirection). The good thing about WS is that when this conversational feature will get implemented it will get implemented transparently.
One bad thing about WS is the fact that it seems to have set its mind on spec-ing out every conceivable form of communication. The spec is starting to mushroom and is being stretched in any conceivable direction in order to cover every God-forsaken issue. This is obviously a problem, but I think the market will solve it. It is a pretty simple case of supply and demand: if an issue has a big demand (in other words is considered important by the majority of OASIS members) OASIS (the supply side) will produce a quality spec covering it and the app server vendors will implement this spec. If the demand for such an issue is low, then it will not get as much attention from OASIS and it will either be put on the back-burner or a low-quality spec (reflecting the lack of resources dedicated to it and the poverty of use-cases due to little input) will be produced.
As far as the vendors are concerned they will concentrate on what is business-relevant and drop the rest of the protocols, simply because they will not have the resources to implement every spec that comes out of OASIS. It doesn’t make sense to implement things your customers don’t need only because they happen to be on a piece of paper that some committe wrote (it would make a lot more sense to have third-parties implement the most obscure parts of the WS spec and have the application server vendors plug these implementations into their app servers, but this is a side thought). I think that the vendors will focus on what is mostly needed (WS-Transaction being a good example) and then delegate the rest of the spec either to a third party, which could be open source projects under the OASIS umbrella, or the recycle bin ;-). I hope that the vendor community will take this market-based approach to the WS specs comig out of OASIS, it is a very flexible and efficient approach, and not get moored in a sea of meaningless and purposeless specs.

P.S. Well, you probably got used to this by now, but this post was also written in a hurry. I’m sorry, can’t do much about it.

No Comments | Tags: Development, Favorites

8 January 2007 - 23:37New category

I created a new category “Econo-computing” which addresses issues in the computing field seen from an economics point of view. I am not sure what will go in this category, but I find this field interesting and I will continue to explore it.

No Comments | Tags: Econo-computing

8 January 2007 - 23:36Supply and demand in a computing environment

I was talking in a previous post about how the demand for interceptions in an application was not met properly by the supply until AOP came about and created a scalable process that could handle interceptions efficiently. I am thinking that you could generalize this case and state that this mis-match between the demand for a behavior or feature and the supply for that behavior or feature betrays an ineffiency in the computing environment (language + IDEs + frameworks + containers + etc…) that tried to address the issue. In the example above implementing interceptions following strict OOP concepts was certainly possible, but it was unfeasible.
In a market environment high demand that cannot be met by supply usually translates into a high price. In the above example the high price would have been the cost of developing and, even more, maintaining, the system in which interceptions were implemented in pure OOP fashion. In the example of OOP done in C the cost would have been the cost of coordinating a team of programmers and the cost of turning a programmer into a human compiler.

Mismatches between demand and supply in a computing environment sometimes appear as the inability to scale. If in the interceptions example the inability to scale resulted in mushrooming of sub-classes which would basically re-implement the original methods with some interception logic around them, in the example of OOP done in C the inability to scale was the inability to have a large team of developers follow some coding rules that would have resulted an OOP-style coding. These 2 examples could not scale, or in other words as the demand (for interceptions) was increasing, the supply (the OOP environment in which these interceptions were coded) could not keep up. Inability to scale a behavior is basically a facet of a mis-match between demand and supply for a particular feature.
It would be interesting to see how you could address these mismatches. Looking at how the interceptions and C++ problems were addressed it would appear that it requires some large investment of some kind. This investment, which usually takes the form of a framework which addresses the problem, is strategic to a certain extent, since it will basically keep your costs down for a certain period of time.

P.S. I know that I am stretching some concepts over here, but quite frankly I don’t care. I’ll be continuing to explore this field as I find it pretty interesting.
P.P.S. I wrote this post also in a hurry, I may come back to it. I gotta go right now.

No Comments | Tags: Econo-computing, Favorites, Management

8 January 2007 - 16:20Interceptors and scalability

I was thinking about the roots of AOP, about what makes it different from others programming paradigms, about what defines it and it came down to interceptions. AOP is pretty much about intercepting method calls and doing something before or after. This interception mechanism grew into a full-fledged language in the case of AspectJ or into a full-fledged binding mechanism in the case of Spring AOP.
Intercepting method calls was doable in plain OOP but it was pretty hard to do. You could, for example, extend a class and override a method with method that would intercept the original call, do something around it and then call the original method. You could do this, but this could not scale. You could not do this for dozens of methods and you could not turn an interceptor on or off as your application would need. In comes AOP which takes this interception mechanism and builds a whole environment around it so that you can apply interceptions as you need. Both AOP environments I mentioned (AspectJ and Spring AOP) scale very well so that it can match the demand of interceptions with the supply, either thru a programming language, as in the case of AspectJ, or declaratively, as in the case of Spring AOP.
So I would say that AOP is the main way to implement interception-based programming in an OOP environment.

P.S. I wrote this post in a hurry, and unfortunately I could not expand on some aspects ;-) of this (the supply and demand of interceptions was something I wanted to explore more in detail). I would probably re-visit this post later.

1 Comment | Tags: Development, Econo-computing