lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Mahout - 20news example
Date Sat, 29 Oct 2011 06:35:25 GMT
The reason for including the target variable in the test file is so that the
classifier can be run and the output compared to the correct answer.
 Otherwise, all that would be possible is to get the output of the
classifier and you would have to run an entire other program to find out
which answers were correct and which not.  Having the classification and
verification happen together is just easier.


On Fri, Oct 28, 2011 at 7:58 PM, Sam Cunningham <sam_cunnin@yahoo.com>wrote:

> I have a text classification project. So, I am going through the examples
> provided in Mahout in Action book. 20news example works fine for me.
> However, I don't understand something: Why do we include the target
> variables in the test data files? (target variable - tab - text content). I
> understand that in order for us to train the program we need to provide
> target variables but I don't understand why we include target variables in
> the test files? Isn't Mahout supposed to determine them by using the model
> created from training? Just to test that, I renamed the folder names under
> 20news-bydate-test to 1, 2, 3, ...20. Then I ran prepare20newsgroups to
> generate the files required for naive bayes classifier. The new files
> included renamed folder names as target variables such that 1, 2, 3, ...
> 20.
> When I ran the testclassifier after training the classifier, I received the
> the following error. Why? Please help me understand. Also, is there Java
> source code for 20newsgroup bayes classification (instead of command line)?
>
> Exception in thread "main" java.lang.IllegalArgumentException: Label not
> found: 20
>        at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>        at
>
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>        at
>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
>        at
>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>        at
>
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
>        at
>
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>        at
>
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:252)
>        at
>
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:185)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>        at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Mahout-20news-example-tp3462754p3462754.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message