lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Cunningham <sam_cun...@yahoo.com>
Subject Mahout - 20news example
Date Sat, 29 Oct 2011 02:58:36 GMT
I have a text classification project. So, I am going through the examples
provided in Mahout in Action book. 20news example works fine for me.
However, I don't understand something: Why do we include the target
variables in the test data files? (target variable - tab - text content). I
understand that in order for us to train the program we need to provide
target variables but I don't understand why we include target variables in
the test files? Isn't Mahout supposed to determine them by using the model
created from training? Just to test that, I renamed the folder names under
20news-bydate-test to 1, 2, 3, ...20. Then I ran prepare20newsgroups to
generate the files required for naive bayes classifier. The new files
included renamed folder names as target variables such that 1, 2, 3, ... 20.
When I ran the testclassifier after training the classifier, I received the
the following error. Why? Please help me understand. Also, is there Java
source code for 20newsgroup bayes classification (instead of command line)?

Exception in thread "main" java.lang.IllegalArgumentException: Label not
found: 20
	at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
	at
org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
	at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
	at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
	at
org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
	at
org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
	at
org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:252)
	at
org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:185)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


--
View this message in context: http://lucene.472066.n3.nabble.com/Mahout-20news-example-tp3462754p3462754.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Mime
View raw message