mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel_mar...@yahoo.com>
Subject Re: Using EnglishAnalyzer in KMeans
Date Wed, 05 Feb 2014 15:27:20 GMT
You must stop using Mahout 0.5 and switch to using Mahout 0.8 or 0.9, the reasons being:-

a)  Mahout 0.5 is past its shelf life and has been purged from all Apache mirrors and hence
is not available for download.
b)  Mahout 0.5 was using Lucene 3.x.  Mahout 0.8 and above use Lucene 4.x, Lucene 4.x is
not backward compatible with Lucene 3.x; most of Lucene packages and classes have been refactored
with faster and leaner indexes. 


The issue u r seeing is due Lucene 3.x jars missing from your classpath, add lucene-core-3.5.jar
to ur classpath and u should be good.







On Wednesday, February 5, 2014 9:05 AM, Sznajder ForMailingList <bs4mailinglist@gmail.com>
wrote:
 
Hi
I am using the Mahout 0.5 and I would like to use the EnglishAnalyzer for
running Kmeans.

However, when running the following command, I get an exception:

bin/mahout seq2sparse -i logs-seqFiles/ -o
log-vectors-monogram-englishanalyzer -ow -s 1 -a
org.apache.lucene.analysis.en.EnglishAnalyzer


I get

Exception in thread "main" java.lang.InstantiationException:
org.apache.lucene.analysis.en.EnglishAnalyzer
        at java.lang.J9VMInternals.newInstanceImpl(Native Method)
        at java.lang.Class.newInstance(Class.java:1375)
        at
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:198)

How can I add this Analyzer to the path?

Benjamin
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message