mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: NPE in bayes wiki example
Date Tue, 30 Nov 2010 02:41:05 GMT
On Tue, Nov 30, 2010 at 7:47 AM, Divya <divya@k2associates.com.sg> wrote:

> Hi,
>
> Thanks for the advice Robin.
> But most of the time I don’t get response of issues I am facing that’s why
>  I reframe it and post it again.

The responses are usually delayed based on availability of free time for all
of us. Mahout community is made up of people who contribute as much as they
can when they find time as it is not part of our day to day work. So in the
time we get, unless we see details of the problem, we can't do anything
other than ask you again for details and this round trip keeps the
conversation going. I can point to many tutorials(even I read through them
before hacking away on Mahout) like this one
http://www.catb.org/~esr/faqs/smart-questions.html which will help you
understand a bit more of why people behave on mailing lists they way you
would have perceived.

>

May someone can understand my problem and would be able to help me.
> As I am new bee to Mahout and don’t have any experience in this field.
>
> We do want more new-bees coming in to Mahout :)

> I am trying run the Wikipedia classification example.
> I have downloaded Wikipedia data set and created chunks of that data(1 MB
> each).
> I am using one of the chunk file for as my input data for Wikipedia
> example.
>
>
> Steps I followed are :
> 1.Created train input data set using one of the chunk of Wikipedia data set
> and subjects.txt with the help of wikipediaDataSetCreator CLI.
> 2.Repeated the first step but here the  used another chunk of Wikipedia
> data set to create test input data.
> 3.Train the classifier by passing train input data set.
> 4.Test the classifier by passing train input data set as model and test
> input data set as testdir.
>
> Now the issue is when I try to testclassifier by passing trained data set
> as model and train input data set as testdir I am able to view the result in
> form of confusion matrix.
> But when I try to test classifier by passing by passing trained data set as
> model and test input data set(which I have created in second step) as
> testdir I get null pointer exception as shown in below mail.
>
Now I get what you are talking about. Can you do one thing. Can you train
the model using the test input dataset and try to classify the test dataset.
I want to check whether there is any corruption in the test dataset which is
causing this NPE




>
>           Name                                                 Size
>
> Initial Train input data set                                 2 MB (two
> chunks)
> Initial Test input data set                                  1 MB (one
> chunk)
> Train data set after wikipediadatasetcreater(part-r-00000)   154 KB
> Test data set after wikipediadatasetcreater(part-r-00000)    43 KB
> Train model data set(trainer-thetaNormalizer)                1 KB
> Train model data set(trainer-tfIdf)                          311 KB
> Train model data set(trainer-weights\Sigma_j)                215 KB
> Train model data set(trainer-weights\Sigma_kSigma_j)          1 KB
> Train model data set(trainer-weights\Sigma_k)                 1 KB
>
>
> The model sizes look fine.  Infact model loading didnt seem to have any
issue as per the logs you posted

> Hope I will get solution of my issue now.
>
> Thanks much
> Regards,
> Divya
>
>
>
>
>
>
>
> -----Original Message-----
> From: Robin Anil [mailto:robin.anil@gmail.com]
> Sent: Monday, November 29, 2010 7:34 PM
> To: user@mahout.apache.org
> Subject: Re: NPE in bayes wiki example
>
> Hi Divya, I am kind of overwhelmed by the flurry of emails from you and the
> replies. I am currently not able to make head and tail of the problem you
> are facing. It would be really helpful if you can write a bit more about
> the
> input files the command your ran, the output files generated. their sizes,
> and so on. and maybe use a single email-thread for all Bayes classifier
> related problems. I guarantee you, I will be able to solve your issues with
> Bayes classifier much faster.
>
> Regards
> Robin
>
> On Mon, Nov 29, 2010 at 12:54 PM, Divya <divya@k2associates.com.sg> wrote:
>
> > Hi,
> >
> > Steps I followed are below :
> >
> > $  bin/mahout wikipediaDataSetCreator  -i
> >
> >
> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Traininput
> > -o examples/bi
> > n/work/wikipedia/wikipediaClassification/train-subject -c
> > $MAHOUT_HOME/examples/src/test/resources/subjects.txt
> >
> > $  bin/mahout wikipediaDataSetCreator  -i
> >
> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Testinput
> > -o examples/bin
> > /work/wikipedia/wikipediaClassification/test-subject -c
> > $MAHOUT_HOME/examples/src/test/resources/subjects.txt
> >
> > $ bin/mahout trainclassifier -i
> > examples/bin/work/wikipedia/wikipediaClassification/train-subject -o
> > examples/bin/work/wikipedia/wikip
> > ediaClassification/wikipedia-subject-model
> >
> > $ bin/mahout testclassifier -m
> >
> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
> > -d examples/bin/work/wikipedia/wikipediaClassification/test-subject
> >
> >
> > Regards,
> > Divya
> >
> >
> >
> > -----Original Message-----
> > From: Grant Ingersoll [mailto:gsingers@apache.org]
> > Sent: Saturday, November 27, 2010 8:54 PM
> > To: user@mahout.apache.org
> > Subject: Re: NPE in bayes wiki example
> >
> > Can you provide all the steps you have done up to this point?
> >
> > -Grant
> >
> > On Nov 25, 2010, at 12:57 AM, Divya wrote:
> >
> > > Hi,
> > >
> > > I am getting null pointer exception when I pass my test input data to
> > > testclassifier
> > >
> > >
> > >
> > > $ bin/mahout testclassifier -m
> > >
> >
> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
> > > -d examples/bin/work/wikipe
> > >
> > > dia/wikipediaClassification/test-subject
> > >
> > > Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
> > >
> > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
> > >
> > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Loading model from:
> > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
> > >
> > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
> > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
> > >
> > >
> >
> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/test-subject}
> > >
> > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Testing Bayes Classifier
> > >
> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
> > >
> >
> >
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > > pedia-su
> > >
> > > bject-model/trainer-weights/Sigma_j/part-00000
> > >
> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
> > >
> >
> >
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > > pedia-su
> > >
> > > bject-model/trainer-weights/Sigma_k/part-00000
> > >
> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
> > >
> >
> >
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > > pedia-su
> > >
> > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
> > >
> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: 8.048212844092422
> > >
> > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
> > >
> >
> >
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > > pedia-su
> > >
> > > bject-model/trainer-thetaNormalizer/part-00000
> > >
> > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
> > >
> >
> >
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > > pedia-su
> > >
> > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
> > >
> > > 10/11/25 13:51:39 INFO datastore.InMemoryBayesDatastore: history
> > > -23722.080627413125 23722.080627413125 -1.0
> > >
> > > Exception in thread "main" java.lang.NullPointerException
> > >
> > >        at
> > >
> >
> >
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:1
> > > 02)
> > >
> > >        at
> > >
> >
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
> > > java:118)
> > >
> > >        at
> > >
> >
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
> > > java:122)
> > >
> > >        at
> > >
> >
> >
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.jav
> > > a:90)
> > >
> > >        at
> > >
> >
> >
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:
> > > 68)
> > >
> > >        at
> > >
> >
> >
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestCla
> > > ssifier.java:266)
> > >
> > >        at
> > >
> >
> >
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:1
> > > 86)
> > >
> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >
> > >        at
> > >
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> > > )
> > >
> > >        at
> > >
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> > > .java:25)
> > >
> > >        at java.lang.reflect.Method.invoke(Method.java:597)
> > >
> > >        at
> > >
> >
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver
> > > .java:68)
> > >
> > >        at
> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >
> > >        at
> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
> > >
> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >
> > >        at
> > >
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> > > )
> > >
> > >        at
> > >
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> > > .java:25)
> > >
> > >        at java.lang.reflect.Method.invoke(Method.java:597)
> > >
> > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > >
> > >
> > >
> > > My classifier is subjects.txt which has two entries History and
> Science.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > but when I pass train input data I get to see the results
> > >
> > >
> > >
> > > $ bin/mahout testclassifier -m
> > >
> >
> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
> > > -d examples/bin/work/wikipe
> > >
> > > dia/wikipediaClassification/train-subject
> > >
> > > Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
> > >
> > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
> > >
> > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Loading model from:
> > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
> > >
> > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
> > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
> > >
> > >
> >
> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/train-subject}
> > >
> > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Testing Bayes Classifier
> > >
> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> > >
> >
> >
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > > pedia-su
> > >
> > > bject-model/trainer-weights/Sigma_j/part-00000
> > >
> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> > >
> >
> >
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > > pedia-su
> > >
> > > bject-model/trainer-weights/Sigma_k/part-00000
> > >
> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> > >
> >
> >
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > > pedia-su
> > >
> > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
> > >
> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: 8.048212844092422
> > >
> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> > >
> >
> >
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > > pedia-su
> > >
> > > bject-model/trainer-thetaNormalizer/part-00000
> > >
> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> > >
> >
> >
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > > pedia-su
> > >
> > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
> > >
> > > 10/11/25 13:51:55 INFO datastore.InMemoryBayesDatastore: history
> > > -23722.080627413125 23722.080627413125 -1.0
> > >
> > > 10/11/25 13:51:55 INFO bayes.TestClassifier: Classified instances from
> > > part-r-00000
> > >
> > > 10/11/25 13:51:55 INFO bayes.TestClassifier:
> > > =======================================================
> > >
> > > Summary
> > >
> > > -------------------------------------------------------
> > >
> > > Correctly Classified Instances          :          2           100%
> > >
> > > Incorrectly Classified Instances        :          0             0%
> > >
> > > Total Classified Instances              :          2
> > >
> > >
> > >
> > > =======================================================
> > >
> > > Confusion Matrix
> > >
> > > -------------------------------------------------------
> > >
> > > a       <--Classified as
> > >
> > > 2        |  2           a     = history
> > >
> > > Default Category: unknown: 1
> > >
> > >
> > >
> > >
> > >
> > > 10/11/25 13:51:55 INFO driver.MahoutDriver: Program took 953 ms
> > >
> > >
> > >
> > >
> > >
> > > Can someone please explain the reason behind it.
> > >
> > >
> > >
> > > Thanks
> > >
> > > Regards,
> > >
> > > Divya
> > >
> >
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem docs using Solr/Lucene:
> > http://www.lucidimagination.com/search
> >
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message