mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Jones <>
Subject newbie question: LSA anaylsis + others
Date Wed, 17 Jun 2009 02:04:08 GMT
Hi to one and all

First time on this list, have read through the wiki, faq and other docs, but before I dived
further into Mahout I had a few questions or should I say clarifications.
I am looking for a system which would allow me to:

1. Take a set of words
2. Build clusters of these words, i.e work out the semantic relationship between these (I
guess I could use wordnet as a starter) words. i.e inter-relationships
3. Once clusters have been formed of words, also work out relationship between the clusters

so in essence I could work out that red was similiar to crimson, and hence a search on red
would produce docs with crimson in them even though red was not mentioned.

would mahout work here?

Of course prior to this, there is the problem of cleaning up the data, i.e stemming etc.

Now I have read several detailed papers on clustering, ranking, etc, and of course some algos
are better than others, but to me a platform like Mahout seems interesting since you can deploy
the existing ones in the system, and also later on add others.

Looking at the algorithms it seems as if LSI (PLSI) has not been implemented as yet, if so
which other algo would "suffice" in this case. Admitedley my knowledge of algos is poor to
say the least :-). Also where would (if it does) Lucene fit in, would it be used to search
the results after the algo's had been applied ? since it seems as if Lucene just uses a weighting
system to create the index, or can Mahout do it all.

As you can see confused, but this is my first pass at this system.



P.S are any of the algo's feedback algo's, i.e so that someone could inprove results using
user feedback.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message