mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Divya" <di...@k2associates.com.sg>
Subject RE: NPE in bayes wiki example
Date Tue, 30 Nov 2010 02:17:11 GMT
Hi,

Thanks for the advice Robin.
But most of the time I don’t get response of issues I am facing that’s why  I reframe
it and post it again.
May someone can understand my problem and would be able to help me.
As I am new bee to Mahout and don’t have any experience in this field.

I am trying run the Wikipedia classification example.
I have downloaded Wikipedia data set and created chunks of that data(1 MB each).
I am using one of the chunk file for as my input data for Wikipedia example.


Steps I followed are :
1.Created train input data set using one of the chunk of Wikipedia data set and subjects.txt
with the help of wikipediaDataSetCreator CLI.
2.Repeated the first step but here the  used another chunk of Wikipedia data set to create
test input data.
3.Train the classifier by passing train input data set.
4.Test the classifier by passing train input data set as model and test input data set as
testdir.

Now the issue is when I try to testclassifier by passing trained data set as model and train
input data set as testdir I am able to view the result in form of confusion matrix.
But when I try to test classifier by passing by passing trained data set as model and test
input data set(which I have created in second step) as testdir I get null pointer exception
as shown in below mail.

           Name                                                 Size 

Initial Train input data set                                 2 MB (two chunks)
Initial Test input data set                                  1 MB (one chunk)
Train data set after wikipediadatasetcreater(part-r-00000)   154 KB
Test data set after wikipediadatasetcreater(part-r-00000)    43 KB
Train model data set(trainer-thetaNormalizer)                1 KB
Train model data set(trainer-tfIdf)                          311 KB 
Train model data set(trainer-weights\Sigma_j)                215 KB
Train model data set(trainer-weights\Sigma_kSigma_j)          1 KB
Train model data set(trainer-weights\Sigma_k)                 1 KB


Hope I will get solution of my issue now.

Thanks much  
Regards,
Divya 







-----Original Message-----
From: Robin Anil [mailto:robin.anil@gmail.com] 
Sent: Monday, November 29, 2010 7:34 PM
To: user@mahout.apache.org
Subject: Re: NPE in bayes wiki example

Hi Divya, I am kind of overwhelmed by the flurry of emails from you and the
replies. I am currently not able to make head and tail of the problem you
are facing. It would be really helpful if you can write a bit more about the
input files the command your ran, the output files generated. their sizes,
and so on. and maybe use a single email-thread for all Bayes classifier
related problems. I guarantee you, I will be able to solve your issues with
Bayes classifier much faster.

Regards
Robin

On Mon, Nov 29, 2010 at 12:54 PM, Divya <divya@k2associates.com.sg> wrote:

> Hi,
>
> Steps I followed are below :
>
> $  bin/mahout wikipediaDataSetCreator  -i
>
> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Traininput
> -o examples/bi
> n/work/wikipedia/wikipediaClassification/train-subject -c
> $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>
> $  bin/mahout wikipediaDataSetCreator  -i
> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Testinput
> -o examples/bin
> /work/wikipedia/wikipediaClassification/test-subject -c
> $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>
> $ bin/mahout trainclassifier -i
> examples/bin/work/wikipedia/wikipediaClassification/train-subject -o
> examples/bin/work/wikipedia/wikip
> ediaClassification/wikipedia-subject-model
>
> $ bin/mahout testclassifier -m
> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
> -d examples/bin/work/wikipedia/wikipediaClassification/test-subject
>
>
> Regards,
> Divya
>
>
>
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org]
> Sent: Saturday, November 27, 2010 8:54 PM
> To: user@mahout.apache.org
> Subject: Re: NPE in bayes wiki example
>
> Can you provide all the steps you have done up to this point?
>
> -Grant
>
> On Nov 25, 2010, at 12:57 AM, Divya wrote:
>
> > Hi,
> >
> > I am getting null pointer exception when I pass my test input data to
> > testclassifier
> >
> >
> >
> > $ bin/mahout testclassifier -m
> >
> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
> > -d examples/bin/work/wikipe
> >
> > dia/wikipediaClassification/test-subject
> >
> > Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
> >
> > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
> >
> > 10/11/25 13:51:36 INFO bayes.TestClassifier: Loading model from:
> > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
> >
> > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
> > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
> >
> >
> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/test-subject}
> >
> > 10/11/25 13:51:36 INFO bayes.TestClassifier: Testing Bayes Classifier
> >
> > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_j/part-00000
> >
> > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_k/part-00000
> >
> > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
> >
> > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: 8.048212844092422
> >
> > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-thetaNormalizer/part-00000
> >
> > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
> >
> > 10/11/25 13:51:39 INFO datastore.InMemoryBayesDatastore: history
> > -23722.080627413125 23722.080627413125 -1.0
> >
> > Exception in thread "main" java.lang.NullPointerException
> >
> >        at
> >
>
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:1
> > 02)
> >
> >        at
> >
>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
> > java:118)
> >
> >        at
> >
>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
> > java:122)
> >
> >        at
> >
>
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.jav
> > a:90)
> >
> >        at
> >
>
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:
> > 68)
> >
> >        at
> >
>
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestCla
> > ssifier.java:266)
> >
> >        at
> >
>
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:1
> > 86)
> >
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >        at
> >
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> > )
> >
> >        at
> >
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> > .java:25)
> >
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >        at
> >
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver
> > .java:68)
> >
> >        at
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >
> >        at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
> >
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >        at
> >
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> > )
> >
> >        at
> >
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> > .java:25)
> >
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >
> >
> >
> > My classifier is subjects.txt which has two entries History and Science.
> >
> >
> >
> >
> >
> >
> >
> > but when I pass train input data I get to see the results
> >
> >
> >
> > $ bin/mahout testclassifier -m
> >
> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
> > -d examples/bin/work/wikipe
> >
> > dia/wikipediaClassification/train-subject
> >
> > Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
> >
> > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
> >
> > 10/11/25 13:51:54 INFO bayes.TestClassifier: Loading model from:
> > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
> >
> > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
> > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
> >
> >
> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/train-subject}
> >
> > 10/11/25 13:51:54 INFO bayes.TestClassifier: Testing Bayes Classifier
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_j/part-00000
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_k/part-00000
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: 8.048212844092422
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-thetaNormalizer/part-00000
> >
> > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
> >
>
> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
> > pedia-su
> >
> > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
> >
> > 10/11/25 13:51:55 INFO datastore.InMemoryBayesDatastore: history
> > -23722.080627413125 23722.080627413125 -1.0
> >
> > 10/11/25 13:51:55 INFO bayes.TestClassifier: Classified instances from
> > part-r-00000
> >
> > 10/11/25 13:51:55 INFO bayes.TestClassifier:
> > =======================================================
> >
> > Summary
> >
> > -------------------------------------------------------
> >
> > Correctly Classified Instances          :          2           100%
> >
> > Incorrectly Classified Instances        :          0             0%
> >
> > Total Classified Instances              :          2
> >
> >
> >
> > =======================================================
> >
> > Confusion Matrix
> >
> > -------------------------------------------------------
> >
> > a       <--Classified as
> >
> > 2        |  2           a     = history
> >
> > Default Category: unknown: 1
> >
> >
> >
> >
> >
> > 10/11/25 13:51:55 INFO driver.MahoutDriver: Program took 953 ms
> >
> >
> >
> >
> >
> > Can someone please explain the reason behind it.
> >
> >
> >
> > Thanks
> >
> > Regards,
> >
> > Divya
> >
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem docs using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
>


Mime
View raw message