30 January 2007 - 20:30Tagging
Tagging is pretty much synonimous with Web 2.0 and collaboration. It seems to pop-up all-over the place: you tag movies, posts, your bookmarks on del.icio.us, etc… I think it is pretty popular because it requires minimal input from the user: just give us 2, 3 words about this page and we will take it from here. In today’s fast-paced world you cannot ask a user to provide significant input, it simply doesn’t have the time. You also do not have the resources to process this significant input, because processing it would require human readers, rather than software and these readers cost a lot. Another reason why tagging is popular is that it is also good for categorizing because it is a way of categorizing an item AFTER you have consumed it and not BEFORE, as a result you could argue that it splits items into bins a lot better than setting up rigid categories up-front.
But this flexibility has an ugly reverse: the low signal-to-noise ratio which comes from many people using the same tags for tagging very different items. There are 40,000 words in the English language which you could use for tags and a probably a dozen million web-pages which are tagged in del.icio.us (del.icio.us hit 1 million users sometime ago, I assume each of them has at least 10 tagged items). Do you think that tagging can scale meaningfully at this level? You will try to apply various algorithms in order to differentiate the items being tagged (number of tags, frequency, time, etc…) but at the end of the day you will end up with very large datasets which need to be processed MANUALLY.
Which brings me to the next problem: the huge cost with false-positives: in order to register a false-positive (a page that has been tagged incorrectly and that doesn’t service your needs) you need to consume the information, to read it, and this takes usually a lot of time.
I gave up long time ago to retrieve good information by navigating my del.icio.us tags and trying to find common things with other people, the amount of information that I have to READ (which is a very expensive process) in order to find something worthwhile is ridiculous. I find that it is a lot more effective to keep reading from good sources of information in order to get an understanding of a new field rather than data-mine del.icio.us for my needs. It takes more time, but it is a lot less expensive.
One last problem with tagging is that it is a level playing field: a developer with 2 years of experience ranks on the same level as a someone who worked in, let’s say, transaction management for the last 15 years. If both of them tag different pages with the tag ‘transaction-management’ which one do you think is the page that better reflects the tag? Actually, let me re-phrase that: which page reflects the tag better for who? The pages tagged by the transaction-management specialist will probably be useful for someone working with high-level transactions and working on transaction propagation, while the pages tagged by the junior developer will be useful for another junior developer starting to learn about ACID. You could say that tagging works best in a homogenous environment where everybody expects pretty similar results from the same tag.
Tagging has been with us for a while, but so far it is used for pretty simple associations. I’m still waiting for the “wow” moment…
No Comments | Tags: Miscellaneous