mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: NPE in bayes wiki example
Date Mon, 29 Nov 2010 11:33:58 GMT
Hi Divya, I am kind of overwhelmed by the flurry of emails from you and the
replies. I am currently not able to make head and tail of the problem you
are facing. It would be really helpful if you can write a bit more about the
input files the command your ran, the output files generated. their sizes,
and so on. and maybe use a single email-thread for all Bayes classifier
related problems. I guarantee you, I will be able to solve your issues with
Bayes classifier much faster.

Regards
Robin

On Mon, Nov 29, 2010 at 12:54 PM, Divya <divya@k2associates.com.sg> wrote:

> Hi,
>
> Steps I followed are below :
>
> $  bin/mahout wikipediaDataSetCreator  -i
>
> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Traininput
> -o examples/bi
> n/work/wikipedia/wikipediaClassification/train-subject -c
> $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>
> $  bin/mahout wikipediaDataSetCreator  -i
> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Testinput
> -o examples/bin
> /work/wikipedia/wikipediaClassification/test-subject -c
> $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>
> $ bin/mahout trainclassifier -i
> examples/bin/work/wikipedia/wikipediaClassification/train-subject -o
> examples/bin/work/wikipedia/wikip
> ediaClassification/wikipedia-subject-model
>
> $ bin/mahout testclassifier -m
> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
> -d examples/bin/work/wikipedia/wikipediaClassification/test-subject
>
>
> Regards,
> Divya
>
>
>
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org]
> Sent: Saturday, November 27, 2010 8:54 PM
> To: user@mahout.apache.org
> Subject: Re: NPE in bayes wiki example
>
> Can you provide all the steps you have done up to this point?
>
> -Grant
>
> On Nov 25, 2010, at 12:57 AM, Divya wrote:
>
> > Hi,
> >
> > I am getting null pointer exception when I pass my test input data to
> > testclassifier
> >
> >
> >
> > $ bin/mahout testclassifier -m
> >
> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
> > -d examples/bin/work/wikipe
> >
> > dia/wikipediaClassification/test-subject
> >
> > Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
> >
> > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
> >
> > 10/11/25 13:51:36 INFO bayes.TestClassifier: Loading model from:
> > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
> >
> > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
> > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
> >
> >
> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/test-subject}
> >
> > 10/11/25 13:51:36 INFO bayes.TestClassifier: Testing Bayes Classifier
> >
> > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_j/part-00000
> >
> > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_k/part-00000
> >
> > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
> >
> > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: 8.048212844092422
> >
> > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-thetaNormalizer/part-00000
> >
> > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
> >
> > 10/11/25 13:51:39 INFO datastore.InMemoryBayesDatastore: history
> > -23722.080627413125 23722.080627413125 -1.0
> >
> > Exception in thread "main" java.lang.NullPointerException
> >
> >        at
> >
>
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:1
> > 02)
> >
> >        at
> >
>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
> > java:118)
> >
> >        at
> >
>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
> > java:122)
> >
> >        at
> >
>
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.jav
> > a:90)
> >
> >        at
> >
>
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:
> > 68)
> >
> >        at
> >
>
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestCla
> > ssifier.java:266)
> >
> >        at
> >
>
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:1
> > 86)
> >
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >        at
> >
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> > )
> >
> >        at
> >
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> > .java:25)
> >
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >        at
> >
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver
> > .java:68)
> >
> >        at
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >
> >        at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
> >
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >        at
> >
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> > )
> >
> >        at
> >
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> > .java:25)
> >
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >
> >
> >
> > My classifier is subjects.txt which has two entries History and Science.
> >
> >
> >
> >
> >
> >
> >
> > but when I pass train input data I get to see the results
> >
> >
> >
> > $ bin/mahout testclassifier -m
> >
> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
> > -d examples/bin/work/wikipe
> >
> > dia/wikipediaClassification/train-subject
> >
> > Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
> >
> > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
> >
> > 10/11/25 13:51:54 INFO bayes.TestClassifier: Loading model from:
> > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
> >
> > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
> > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
> >
> >
> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/train-subject}
> >
> > 10/11/25 13:51:54 INFO bayes.TestClassifier: Testing Bayes Classifier
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_j/part-00000
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_k/part-00000
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: 8.048212844092422
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-thetaNormalizer/part-00000
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
> >
> > 10/11/25 13:51:55 INFO datastore.InMemoryBayesDatastore: history
> > -23722.080627413125 23722.080627413125 -1.0
> >
> > 10/11/25 13:51:55 INFO bayes.TestClassifier: Classified instances from
> > part-r-00000
> >
> > 10/11/25 13:51:55 INFO bayes.TestClassifier:
> > =======================================================
> >
> > Summary
> >
> > -------------------------------------------------------
> >
> > Correctly Classified Instances          :          2           100%
> >
> > Incorrectly Classified Instances        :          0             0%
> >
> > Total Classified Instances              :          2
> >
> >
> >
> > =======================================================
> >
> > Confusion Matrix
> >
> > -------------------------------------------------------
> >
> > a       <--Classified as
> >
> > 2        |  2           a     = history
> >
> > Default Category: unknown: 1
> >
> >
> >
> >
> >
> > 10/11/25 13:51:55 INFO driver.MahoutDriver: Program took 953 ms
> >
> >
> >
> >
> >
> > Can someone please explain the reason behind it.
> >
> >
> >
> > Thanks
> >
> > Regards,
> >
> > Divya
> >
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem docs using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message