Saturday, November 01, 2008

Introduction to Carrot2 Clustering Engine/API


I was pretty much impressed by the easily comprehensible yet powerful facilities provided by this component based clustering engine; thanks to it's devleopers for releasing it's source code. Unlike this opensource engine, there is another clustering engine by Vivisimo, called Vivisimo Velocity, which is commerical and hence helps us in no way. Anyways, to see the best that can be done with VV, try Clusty, which efficiently clusters search results using the Vivisimo clustering technology. However, clusty isn't state of the art in coming up with semantically near-perfection cluster label names, as discussed here.

Anyways, let's appreciate opensource - back to Carrot2.

So, Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize (cluster) search results into thematic categories, called clusters.

Carrot2 provides an architecture for acquiring search results from various sources (YahooAPI, GoogleAPI(deprecated), MSN Search API, eTools Meta Search, Alexa Web Search, PubMed, OpenSearch, Lucene index, SOLR), clustering the results and visualising the clusters. Currently, 5 clustering algorithms are available that are suitable for different kinds of document clustering tasks.










The architecture of Carrot2 is based on a pipeline of components of three types: input components, filter components and visualization components. The task of input components is to provide search results for clustering based on a user query. Filter components transform the results in some way (e.g. apply clustering, case normalization), and the visualisation components render the transformed results for the user.



I have successfully walked through the most basic example application of this api. It simply uses yahoo api and the lingo clustering algorithm to cluster the results of a certain query. The best way to understand the mechanics of carrot2 and to make the most out of it's abilites, one needs to follow the code while studying the comprehensively written javadoc documentation.

So, I did so.. but for now it's about time for me to shutdown my brain for next 4-5 hours.. I'll continue this post and will try to put forward a precise text extracted out of those comments in the next part, explaining in detail things one needs to know to get started with carrot2!

for now, over and out!

No comments: