Thursday, November 13, 2008

If you can't look him straight in the eye

Following is the poem that Chuck Hagel, Nebraska's senior U.S. Senator read out in his Farewell to the U.S. Senate on 2nd October, 2008. The poem's actual author for some is unknown, but this site claims that it's Peter "Dale" Wimbrow Sr. Anyways, it's so perfectly written that none can just ignore it; yet, we can only wish..

When you get what you want in your struggle for self

And the world makes you king for a day,

Just go to the mirror and look at yourself

And see what that man has to say.

For it isn't your father or mother or wife

Whose judgment upon you must pass.

The fellow whose verdict counts most in your life

Is the one staring back from the glass.

You may be like Jack Horner and chisel a plum

And think you're a wonderful guy.

But the man in the glass says you're only a bum

If you can't look him straight in the eye.

He's the fellow to please--never mind all the rest,

For he's with you clear to the end.

And you've passed your most dangerous, difficult test

If the man in the glass is your friend.

You may fool the whole world down the pathway of years

And get pats on the back as you pass.

But your final reward will be heartache and tears

If you've cheated the man in the glass.

[source: http://hagel.senate.gov/public/index.cfm?FuseAction=Home.FarewellSpeech]

Friday, November 07, 2008

Subversion – Introduction

I have been working with Subversion for a while now, but I started right out of an immediate need, and hence without any conceptual background. This time however, a lack of some basic understanding hindered my learning, so, I took out some time to skim through the most reliable source of information. I found this Subversion book titled as "Version Control with Subversion", which can be downloaded here. It's time consuming to go through a 400 pages book to learn about something secondary in importance to my (or any developer's) work, so, I thought it might be useful to summarize things I'll read right away.

Following is the summarized selection I have extracted from the first chapter of the book..

A version control system tries to enable collaborative editing and sharing of data. Subversion is one such system. Different systems use different technique to implement this collaborative environment; even Subversion supports a couple of different methods. It can manage any sort of file collections (not limited to source-code only). At its core, just like any other version control system, there is a repository; it stores information in the form of a file-system tree of files and directories (just like a typical file server). Clients connect to the repository, to read and write files. The operations are synchronized, and each client sees the latest version of the files stored on the repository.

What distinguishes it from a file server, however, is its ability to remember all the changes ever made to any of the files or directories, as well changes in the directory structure and addition and deletion of files. The fundamental problem faced by all version control systems is as questioned: how will the system allow users to share information, but prevent them from accidentally stepping on each other's feet? It's all too easy for users to accidentally overwrite each other's changes in the repository.

In sum, the problem reduces to the following questions: How the latest version of a file should represent all the changes made by some writers, when some reader is reading it? Anyways, two solutions have been proposed to this problem:

The Lock-Modify-Unlock Solution

Three problems:

  1. Locking may cause administrative problems – you lock a file and go on vacation
  2. Locking may cause unnecessary serialization – both need to modify different portions
  3. Locking may create a false sense of security – lock A, then ask for B; lock B, then ask for A

The Copy-Modify-Merge Solution (Used by Subversion and many other systems)

In the fourth action, Harry failed to write the file back to repository because he had an older version (which he was modifying) as compared to the one currently on the latest repository. This is where the concept of “merge” needs to be introduced. So, once Harry downloads the latest repository version, he merges his changes done in the older version into this downloaded latest version, and, writes this merged file back to the repository (A* above). Finally, Sally can now read this updated file that has her modifications as well as that of Harry’s. This solution is much preferred over The Lock-Modify-Unlock Solution in many cases except when the files are sound files or binary files where it will become almost impossible to ensure consistency of the changes made by multiple clients at the same time.

SUBVERSION - IN ACTION

Repository URLs You can access Subversion repositories through many different methods—on local disk or through various network protocols, depending on how your administrator has set things up for you. A repository location, however, is always a URL. Table 1.1, “Repository access URLs” describes how different URL schemes map to the available access methods. Working Copies Subversion has this concept of a working copy, which is essentially an up to date copy of the project source-code that you’ll download (or checkout) from the underlying repository. If the repository has multiple projects, then you’ll need to specifically mention the exact URL of the project subdirectory while issuing a check-out command to SVN, say something like this:

svn checkout http://svn.example.com/repository/my_project

Now, there are two possibilities:

  1. You can checkout/download the source-code of my_project in some ordinary directory; you’ll need to get some typical SVN client to do so. I use TortoiseSVN, which can be downloaded here.
  2. Alternatively, you can checkout/download the source-code into some workspace project’s source folder of the IDE you use for development. I use Eclipse Ganymede these days, and the Subclipse v1.4 plugin does the job for me; it can be downloaded here.

It’s easy to get used to the synchronization mechanisms adopted by Subversion. Most of the times, the only operations one shall use as a developer are Commit, Update, Merge, Compare, Restore, etc. I won’t get into details of each command here (respecting the order of knowledge in the book). Once you’ve checked out the source-code, it’s time for you to play around with it just like you can with any of your local projects. Important thing to note here is that whatever you’ll modify will not have an effect on the original source-code in the repository, and hence the name ‘working copy’. But, at some point in time, you’ll need to incorporate (or commit) your changes into the original files in the project repository. This operation is known as COMMIT, and to do this commit, you’ll execute a command like this:

svn commit MyMainClass.java -m "Fixed a bug in main class"

If all goes good, you’ll be shown a confirmation that your changes have been committed and the repository is now updated. So, the next time you’ll check-out the same my_project code from the repository, you can witness your updated MyMainClass.java code. There is just one last thing that needs to be understood. Consider the following case:

  1. You and some other developer checkout my_project at the same time (hence the same version).
  2. You make changes to MyMainClass.java’s methodX, and, you commit the code.
  3. Then, the other developer makes his/her changes to the same class’s methodY, and tries to commit. This commit however, will fail.

Reason: he/she is trying to commit a modified yet NOT up to date version of the file to the repository. It’s no more up to date, because the latest on repository is YOUR’S MyMainClass.java.

To get over this problem, the other developer will issue an update command, like this:

svn update

The svn will then automatically TRY to update the working copy of this developer by incorporating the changes made by you (or any other changes in the latest version) . The developer, however, is required to do some manual modifications, if he was also updating the same method (i.e. method) or section of code.

Switching to Carrot2.. for now, over & out.

Sunday, November 02, 2008

Stage set - Hillary, Incominggg..

Venue: George Mason University, Fairfax, VA
Date: 2nd November, 2008

Saturday, November 01, 2008

Introduction to Carrot2 Clustering Engine/API


I was pretty much impressed by the easily comprehensible yet powerful facilities provided by this component based clustering engine; thanks to it's devleopers for releasing it's source code. Unlike this opensource engine, there is another clustering engine by Vivisimo, called Vivisimo Velocity, which is commerical and hence helps us in no way. Anyways, to see the best that can be done with VV, try Clusty, which efficiently clusters search results using the Vivisimo clustering technology. However, clusty isn't state of the art in coming up with semantically near-perfection cluster label names, as discussed here.

Anyways, let's appreciate opensource - back to Carrot2.

So, Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize (cluster) search results into thematic categories, called clusters.

Carrot2 provides an architecture for acquiring search results from various sources (YahooAPI, GoogleAPI(deprecated), MSN Search API, eTools Meta Search, Alexa Web Search, PubMed, OpenSearch, Lucene index, SOLR), clustering the results and visualising the clusters. Currently, 5 clustering algorithms are available that are suitable for different kinds of document clustering tasks.










The architecture of Carrot2 is based on a pipeline of components of three types: input components, filter components and visualization components. The task of input components is to provide search results for clustering based on a user query. Filter components transform the results in some way (e.g. apply clustering, case normalization), and the visualisation components render the transformed results for the user.



I have successfully walked through the most basic example application of this api. It simply uses yahoo api and the lingo clustering algorithm to cluster the results of a certain query. The best way to understand the mechanics of carrot2 and to make the most out of it's abilites, one needs to follow the code while studying the comprehensively written javadoc documentation.

So, I did so.. but for now it's about time for me to shutdown my brain for next 4-5 hours.. I'll continue this post and will try to put forward a precise text extracted out of those comments in the next part, explaining in detail things one needs to know to get started with carrot2!

for now, over and out!