On Apr 29, 2009, at 10:27 AM, Shashikant Kore wrote: > Hi Jeff, > > The JDK problem occurs while running the example of Synthetic > Control Data from > http://cwiki.apache.org/MAHOUT/syntheticcontroldata.html > > > The other query was related to how to convert convert text files to > Mahout Vector. Let's say, I have text files of wikipedia pages and now > I want to create clusters out of them. How do I get the Mahout vector > from the lucene index? Can you point me to some theory behind it, from > where I can convert it code? I don't think we have any demo code for this yet. I have a personal task that I'm trying to get to that will demonstrate how to cluster text starting from a plain text file, but nothing in code yet, especially not anything that takes it from Lucene. All of these would be great additions to have. I think Richard Tomsett said he had some code to do it, but hasn't donated it yet. He's also put up a patch for doing cosine distance metric, but it is not committed yet. Cheers, Grant -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search