mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Failure to run Clustering example
Date Thu, 30 Apr 2009 23:32:59 GMT

On Apr 29, 2009, at 10:27 AM, Shashikant Kore wrote:

> Hi Jeff,
> The JDK problem occurs while running the example of Synthetic  
> Control Data from
> The other query was related to how to convert convert text files to
> Mahout Vector. Let's say, I have text files of wikipedia pages and now
> I want to create clusters out of them. How do I get the Mahout vector
> from the lucene index? Can you point me to some theory behind it, from
> where I can convert it code?

I don't think we have any demo code for this yet.  I have a personal  
task that I'm trying to get to that will demonstrate how to cluster  
text starting from a plain text file, but nothing in code yet,  
especially not anything that takes it from Lucene.  All of these would  
be great additions to have.  I think Richard Tomsett said he had some  
code to do it, but hasn't donated it yet.  He's also put up a patch  
for doing cosine distance metric, but it is not committed yet.


Grant Ingersoll

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

View raw message