Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 97200 invoked from network); 30 Nov 2010 02:17:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Nov 2010 02:17:47 -0000 Received: (qmail 13051 invoked by uid 500); 30 Nov 2010 02:17:47 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 13023 invoked by uid 500); 30 Nov 2010 02:17:47 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 13015 invoked by uid 99); 30 Nov 2010 02:17:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Nov 2010 02:17:47 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of divya@k2associates.com.sg designates 202.75.59.30 as permitted sender) Received: from [202.75.59.30] (HELO host-9a.onnet.com.my) (202.75.59.30) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Nov 2010 02:17:40 +0000 Received: from 224.210-193-58.adsl.qala.com.sg ([210.193.58.224] helo=k2asystem) by host-9a.onnet.com.my with esmtp (Exim 4.69) (envelope-from ) id 1PNFmI-0002G0-4p for user@mahout.apache.org; Tue, 30 Nov 2010 10:17:10 +0800 From: "Divya" To: References: <002401cb8c65$97ab69d0$c7023d70$@com.sg> <9BDEDD35-9163-4845-A5FA-33964AE2757E@apache.org> <002601cb8f96$879ab300$96d01900$@com.sg> In-Reply-To: Subject: RE: NPE in bayes wiki example Date: Tue, 30 Nov 2010 10:17:11 +0800 Message-ID: <001001cb9034$b27f08c0$177d1a40$@com.sg> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcuPuXZ/TxZv7934SuWMUTET44pWWgAdeIDg Content-Language: en-us X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - host-9a.onnet.com.my X-AntiAbuse: Original Domain - mahout.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - k2associates.com.sg Hi, Thanks for the advice Robin. But most of the time I don=E2=80=99t get response of issues I am facing = that=E2=80=99s why I reframe it and post it again. May someone can understand my problem and would be able to help me. As I am new bee to Mahout and don=E2=80=99t have any experience in this = field. I am trying run the Wikipedia classification example. I have downloaded Wikipedia data set and created chunks of that data(1 = MB each). I am using one of the chunk file for as my input data for Wikipedia = example. Steps I followed are : 1.Created train input data set using one of the chunk of Wikipedia data = set and subjects.txt with the help of wikipediaDataSetCreator CLI. 2.Repeated the first step but here the used another chunk of Wikipedia = data set to create test input data. 3.Train the classifier by passing train input data set. 4.Test the classifier by passing train input data set as model and test = input data set as testdir. Now the issue is when I try to testclassifier by passing trained data = set as model and train input data set as testdir I am able to view the = result in form of confusion matrix. But when I try to test classifier by passing by passing trained data set = as model and test input data set(which I have created in second step) as = testdir I get null pointer exception as shown in below mail. Name Size=20 Initial Train input data set 2 MB (two = chunks) Initial Test input data set 1 MB (one = chunk) Train data set after wikipediadatasetcreater(part-r-00000) 154 KB Test data set after wikipediadatasetcreater(part-r-00000) 43 KB Train model data set(trainer-thetaNormalizer) 1 KB Train model data set(trainer-tfIdf) 311 KB=20 Train model data set(trainer-weights\Sigma_j) 215 KB Train model data set(trainer-weights\Sigma_kSigma_j) 1 KB Train model data set(trainer-weights\Sigma_k) 1 KB Hope I will get solution of my issue now. Thanks much =20 Regards, Divya=20 -----Original Message----- From: Robin Anil [mailto:robin.anil@gmail.com]=20 Sent: Monday, November 29, 2010 7:34 PM To: user@mahout.apache.org Subject: Re: NPE in bayes wiki example Hi Divya, I am kind of overwhelmed by the flurry of emails from you and = the replies. I am currently not able to make head and tail of the problem = you are facing. It would be really helpful if you can write a bit more about = the input files the command your ran, the output files generated. their = sizes, and so on. and maybe use a single email-thread for all Bayes classifier related problems. I guarantee you, I will be able to solve your issues = with Bayes classifier much faster. Regards Robin On Mon, Nov 29, 2010 at 12:54 PM, Divya = wrote: > Hi, > > Steps I followed are below : > > $ bin/mahout wikipediaDataSetCreator -i > > = D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Trainin= put > -o examples/bi > n/work/wikipedia/wikipediaClassification/train-subject -c > $MAHOUT_HOME/examples/src/test/resources/subjects.txt > > $ bin/mahout wikipediaDataSetCreator -i > = D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Testinp= ut > -o examples/bin > /work/wikipedia/wikipediaClassification/test-subject -c > $MAHOUT_HOME/examples/src/test/resources/subjects.txt > > $ bin/mahout trainclassifier -i > examples/bin/work/wikipedia/wikipediaClassification/train-subject -o > examples/bin/work/wikipedia/wikip > ediaClassification/wikipedia-subject-model > > $ bin/mahout testclassifier -m > = examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-mod= el > -d examples/bin/work/wikipedia/wikipediaClassification/test-subject > > > Regards, > Divya > > > > -----Original Message----- > From: Grant Ingersoll [mailto:gsingers@apache.org] > Sent: Saturday, November 27, 2010 8:54 PM > To: user@mahout.apache.org > Subject: Re: NPE in bayes wiki example > > Can you provide all the steps you have done up to this point? > > -Grant > > On Nov 25, 2010, at 12:57 AM, Divya wrote: > > > Hi, > > > > I am getting null pointer exception when I pass my test input data = to > > testclassifier > > > > > > > > $ bin/mahout testclassifier -m > > > = examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-mod= el > > -d examples/bin/work/wikipe > > > > dia/wikipediaClassification/test-subject > > > > Running on hadoop, using = HADOOP_HOME=3DC:\cygwin\home\Divya\hadoop-0.20.2 > > > > HADOOP_CONF_DIR=3DC:\cygwin\home\Divya\hadoop-0.20.2\conf > > > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Loading model from: > > = {basePath=3Dexamples/bin/work/wikipedia/wikipediaClassification/wikipedi > > > > a-subject-model, classifierType=3Dbayes, alpha_i=3D1.0, = dataSource=3Dhdfs, > > gramSize=3D1, verbose=3Dfalse, encoding=3DUTF-8, = defaultCat=3Dunknown, tes > > > > > = tDirPath=3Dexamples/bin/work/wikipedia/wikipediaClassification/test-subje= ct} > > > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Testing Bayes = Classifier > > > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: > > > > = file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/w= iki > > pedia-su > > > > bject-model/trainer-weights/Sigma_j/part-00000 > > > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: > > > > = file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/w= iki > > pedia-su > > > > bject-model/trainer-weights/Sigma_k/part-00000 > > > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: > > > > = file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/w= iki > > pedia-su > > > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000 > > > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: 8.048212844092422 > > > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader: > > > > = file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/w= iki > > pedia-su > > > > bject-model/trainer-thetaNormalizer/part-00000 > > > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader: > > > > = file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/w= iki > > pedia-su > > > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000 > > > > 10/11/25 13:51:39 INFO datastore.InMemoryBayesDatastore: history > > -23722.080627413125 23722.080627413125 -1.0 > > > > Exception in thread "main" java.lang.NullPointerException > > > > at > > > > = org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.jav= a:1 > > 02) > > > > at > > > > = org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatr= ix. > > java:118) > > > > at > > > > = org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatr= ix. > > java:122) > > > > at > > > > = org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.= jav > > a:90) > > > > at > > > > = org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.ja= va: > > 68) > > > > at > > > > = org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(Test= Cla > > ssifier.java:266) > > > > at > > > > = org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.jav= a:1 > > 86) > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native = Method) > > > > at > > > > = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java= :39 > > ) > > > > at > > > > = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl > > .java:25) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > at > > > > = org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDri= ver > > .java:68) > > > > at > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > > > > at > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184) > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native = Method) > > > > at > > > > = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java= :39 > > ) > > > > at > > > > = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl > > .java:25) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > > > > > My classifier is subjects.txt which has two entries History and = Science. > > > > > > > > > > > > > > > > but when I pass train input data I get to see the results > > > > > > > > $ bin/mahout testclassifier -m > > > = examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-mod= el > > -d examples/bin/work/wikipe > > > > dia/wikipediaClassification/train-subject > > > > Running on hadoop, using = HADOOP_HOME=3DC:\cygwin\home\Divya\hadoop-0.20.2 > > > > HADOOP_CONF_DIR=3DC:\cygwin\home\Divya\hadoop-0.20.2\conf > > > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Loading model from: > > = {basePath=3Dexamples/bin/work/wikipedia/wikipediaClassification/wikipedi > > > > a-subject-model, classifierType=3Dbayes, alpha_i=3D1.0, = dataSource=3Dhdfs, > > gramSize=3D1, verbose=3Dfalse, encoding=3DUTF-8, = defaultCat=3Dunknown, tes > > > > > = tDirPath=3Dexamples/bin/work/wikipedia/wikipediaClassification/train-subj= ect} > > > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Testing Bayes = Classifier > > > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: > > > > = file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/w= iki > > pedia-su > > > > bject-model/trainer-weights/Sigma_j/part-00000 > > > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: > > > > = file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/w= iki > > pedia-su > > > > bject-model/trainer-weights/Sigma_k/part-00000 > > > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: > > > > = file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/w= iki > > pedia-su > > > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000 > > > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: 8.048212844092422 > > > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: > > > > = file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/w= iki > > pedia-su > > > > bject-model/trainer-thetaNormalizer/part-00000 > > > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: > > > > = file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/w= iki > > pedia-su > > > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000 > > > > 10/11/25 13:51:55 INFO datastore.InMemoryBayesDatastore: history > > -23722.080627413125 23722.080627413125 -1.0 > > > > 10/11/25 13:51:55 INFO bayes.TestClassifier: Classified instances = from > > part-r-00000 > > > > 10/11/25 13:51:55 INFO bayes.TestClassifier: > > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > > > > Summary > > > > ------------------------------------------------------- > > > > Correctly Classified Instances : 2 100% > > > > Incorrectly Classified Instances : 0 0% > > > > Total Classified Instances : 2 > > > > > > > > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > > > > Confusion Matrix > > > > ------------------------------------------------------- > > > > a <--Classified as > > > > 2 | 2 a =3D history > > > > Default Category: unknown: 1 > > > > > > > > > > > > 10/11/25 13:51:55 INFO driver.MahoutDriver: Program took 953 ms > > > > > > > > > > > > Can someone please explain the reason behind it. > > > > > > > > Thanks > > > > Regards, > > > > Divya > > > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem docs using Solr/Lucene: > http://www.lucidimagination.com/search > > >