mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: newbie question: LSA anaylsis + others
Date Thu, 18 Jun 2009 15:40:43 GMT

On Jun 18, 2009, at 11:18 AM, Paul Jones wrote:

> Okay I have brain freeze, reading the email below:-)
>
> I think PLSI will do (or is a great starter) to what I want. I am  
> looking at a hadoop install, with mahout on top, is there any need  
> of lucene.

I haven't looked at the PLSI Pig thing yet, but I've been using the  
Lucene stuff to produce Vectors from a Lucene index.  So, if you  
already have your own Vectors/Matrix, then no need for Lucene.


>
> Also is there a "dummies" guide to all these algos, i.e which are  
> clustering algos, which are indexing, which are for "abc", since I  
> am reading a ton of information and am not 100% sure of which  
> categories they all fit into....hope the question is not to vague

The Wiki is the place to start.   I've been working on http://cwiki.apache.org/confluence/display/MAHOUT/ClusteringYourData

, but it's far from complete.  That will cover the clustering stuff.   
As for indexing, not sure what you mean.  If you're talking indexing  
as in Lucene, there is no code for that.

FWIW, in answer to your original question, I've seen some people do  
some interesting stuff with Graph Theory (ranking, etc.) and  
relationships between words.

-Grant


Mime
View raw message