Hi,
I got debugger running w/ eclipse so I could watch what was happening under
the hood.
Here's the exception again
Exception in thread "main" java.lang.IllegalArgumentException: Label not
found: alt.atheism from
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at
org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
at
org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
at
org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
at
org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
at
org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
Notice the "Label not found: alt.atheism\tfrom"
That's an invalid label in the confusion matrix. I think it SHOULD be just
alt.atheism. I'm not sure how the \tfrom is getting in there, but it is.
Perhaps it has something to do with the way my test data was formatted.
I'll keep digging....
Thanks,
Vijay
On Mon, Jul 4, 2011 at 8:52 PM, Vijay Santhanam
<vijay.santhanam@gmail.com>wrote:
> Hi Robin,
>
> The console dump was a too large for pastebin, so I uploaded it here 
> http://dl.dropbox.com/u/7881451/build20newsbayesconsoleoutput.txt
>
> I performed a fresh checkout only hours ago, and I used script
> examples/bin/build20newsbayes.sh
> I've opted to avoid hadoop, but from what I can tell the model was created
> with success.
>
>
> Thanks,
> Vijay
>
>
> On Mon, Jul 4, 2011 at 8:28 PM, Robin Anil <robin.anil@gmail.com> wrote:
>
>> Can you send me the console dump
>> Command line + Log written by the program and put it on say pastebin
>>
>> Robin
>>
>> On Mon, Jul 4, 2011 at 3:48 PM, Vijay Santhanam
>> <vijay.santhanam@gmail.com>wrote:
>>
>> > I tried deleting all the folders from the test and train data except for
>> > alt.atheism, but I get the identical error.
>> >
>> > I might try debugging the problem in eclipse rather than from
>> commandline,
>> > but Eclipse doesn't quite want to work either.
>> >
>> >
>> > On Mon, Jul 4, 2011 at 8:02 PM, Vijay Santhanam
>> > <vijay.santhanam@gmail.com>wrote:
>> >
>> > > Thanks anyway Sergey. Could you perhaps upload your bayesmodel folder
>> so
>> > I
>> > > could try that out?
>> > >
>> > >
>> > >
>> > > On Mon, Jul 4, 2011 at 7:57 PM, Sergey Bartunov <sbos.net@gmail.com
>> > >wrote:
>> > >
>> > >> Well, that's strange. Sorry, I can't help you at the moment, maybe
>> > >> someone else in the mailing list could.
>> > >>
>> > >> On 4 July 2011 13:49, Vijay Santhanam <vijay.santhanam@gmail.com>
>> > wrote:
>> > >> > Hi Sergey,
>> > >> >
>> > >> > Yes, there were no errors.
>> > >> >
>> > >> > And all the model data seems to have been populated into
>> bayesmodel
>> > >> folder.
>> > >> > Also, each main folder in bayesmodel has a _SUCESS file.
>> > >> >
>> > >> > See the tarball of my trained model here,
>> > >> > http://dl.dropbox.com/u/7881451/bayesmodel.tar.gz
>> > >> > Please compare it to your trained model if possible, I would like
>> to
>> > >> know if
>> > >> > it's different in any way.
>> > >> >
>> > >> > Perhaps it's corrupted in someway.
>> > >> >
>> > >> > Thanks,
>> > >> > Vijay
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sbos.net@
>> gmail.com>
>> > >> wrote:
>> > >> >
>> > >> >> Stop, did you _train_ the classifier successfully before running
>> the
>> > >> >> _test_?
>> > >> >>
>> > >> >> On 4 July 2011 13:30, Vijay Santhanam <vijay.santhanam@gmail.com>
>> > >> wrote:
>> > >> >> > Hi Sergey,
>> > >> >> >
>> > >> >> > I've tried using both the sh script file and following
the
>> > >> instructions
>> > >> >> at
>> > >> >> > https://cwiki.apache.org/MAHOUT/twentynewsgroups.html
 like
>> you
>> > >> >> suggested.
>> > >> >> > Both return the same results.
>> > >> >> >
>> > >> >> > I've uploaded my bayestestinput folder to dropbox,
the first
>> file
>> > >> is
>> > >> >> > here...
>> > >> >> >
>> http://dl.dropbox.com/u/7881451/bayestestinput/alt.atheism.txt
>> > >> >> >
>> > >> >> > Thanks,
>> > >> >> > Vijay
>> > >> >> >
>> > >> >> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sbos.net@
>> > gmail.com>
>> > >> >> wrote:
>> > >> >> >
>> > >> >> >> Paste somewhere your bayestestinput file.
>> > >> >> >>
>> > >> >> >> On 4 July 2011 13:20, Sergey Bartunov <sbos.net@gmail.com>
>> wrote:
>> > >> >> >> > Yes, I worked WITH hadoop, but there should
be no difference.
>> > >> >> >> >
>> > >> >> >> > Why do you use examples/bin/build/20newsbayes.sh
instead of
>> > >> direct
>> > >> >> >> > running bin/mahout? Is it the same?
>> > >> >> >> >
>> > >> >> >> > On 4 July 2011 13:12, Vijay Santhanam <
>> > vijay.santhanam@gmail.com>
>> > >> >> wrote:
>> > >> >> >> >> Thanks Sergey,
>> > >> >> >> >>
>> > >> >> >> >> I'm still receiving the same error after
following those
>> steps.
>> > >> >> >> >> I've chosen not to use hadoop  does yours
work WITH hadoop?
>> > >> >> >> >>
>> > >> >> >> >> A few bits of info that might be relevant.
>> > >> >> >> >>
>> > >> >> >> >> My examples/bin/work folder contains the
expected folders
>> from
>> > >> test
>> > >> >> data
>> > >> >> >> >> preparation and training...
>> > >> >> >> >> drwxrxrx@ 22 Vijay staff 748 18 Mar
2003
>> > 20newsbydatetest
>> > >> >> >> >> drwxrxrx@ 22 Vijay staff 748 18 Mar
2003
>> > >> 20newsbydatetrain
>> > >> >> >> >> drwxrxrx 3 Vijay staff 102 4 Jul
19:03 bayesmodel
>> > >> >> >> >> drwxrxrx 22 Vijay staff 748 4 Jul
18:20
>> bayestestinput
>> > >> >> >> >> drwxrxrx 22 Vijay staff 748 4 Jul
17:49
>> bayestraininput
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> I appreciate your help, do you have any
other suggestions?
>> > >> >> >> >>
>> > >> >> >> >> Regards,
>> > >> >> >> >> Vijay
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov
<sbos.net@
>> > >> gmail.com>
>> > >> >> >> wrote:
>> > >> >> >> >>
>> > >> >> >> >>> When I started with Mahout I had the
same errors. In my
>> case,
>> > I
>> > >> just
>> > >> >> >> >>> didn't run PrepareTwentyNewsgroups.
You may try to
>> accurately
>> > >> repeat
>> > >> >> >> >>> all steps from
>> > >> >> https://cwiki.apache.org/MAHOUT/twentynewsgroups.html
>> > >> >> >> >>>
>> > >> >> >> >>> On 4 July 2011 12:52, Vijay Santhanam
<
>> > >> vijay.santhanam@gmail.com>
>> > >> >> >> wrote:
>> > >> >> >> >>> > Hi All,
>> > >> >> >> >>> >
>> > >> >> >> >>> > I'm new to Mahout and I'm interested
in experimenting
>> with
>> > >> it's
>> > >> >> >> >>> classifiers.
>> > >> >> >> >>> >
>> > >> >> >> >>> > Right now, I'm just trying to get
up and running with the
>> > >> demo's
>> > >> >> and
>> > >> >> >> >>> > examples.
>> > >> >> >> >>> >
>> > >> >> >> >>> > After checking out the mahout trunk,
I've tried running
>> the
>> > >> >> >> >>> classification
>> > >> >> >> >>> > example 20news, but after running
the
>> > >> >> >> >>> ./examples/bin/build/20newsbayes.sh
>> > >> >> >> >>> > script I get the following error
during the
>> classification
>> > >> phase.
>> > >> >> >> >>> >
>> > >> >> >> >>> > Does anyone else get the same thing?
Or have any
>> > >> recommendations
>> > >> >> >> about
>> > >> >> >> >>> how
>> > >> >> >> >>> > to fix it?
>> > >> >> >> >>> > I'd just like to get a sample classifier
working before I
>> > >> embark
>> > >> >> on
>> > >> >> >> my
>> > >> >> >> >>> own
>> > >> >> >> >>> > classification journey.
>> > >> >> >> >>> >
>> > >> >> >> >>> >
>> > >> >> >> >>> > INFO: Loading model from:
>> > >> >> >> >>> > {basePath=examples/bin/work/20newsbydate/bayesmodel,
>> > >> >> >> >>> classifierType=bayes,
>> > >> >> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1,
verbose=false,
>> > >> >> >> encoding=UTF8,
>> > >> >> >> >>> > defaultCat=unknown,
>> > >> >> >> >>> >
>> > testDirPath=examples/bin/work/20newsbydate/bayestestinput}
>> > >> >> >> >>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: Testing Bayes Classifier
>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: Read 50000 feature weights
>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: Read 100000 feature weights
>> > >> >> >> >>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: 193370.88331085522
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: rec.sport.baseball 129829.34738930278
>> > 531784.7805631821
>> > >> >> >> >>> > 0.2441388925268003
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: sci.crypt 193023.42370049533
531784.7805631821
>> > >> >> >> 0.3629728242618669
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: rec.sport.hockey 167853.6159738822
>> 531784.7805631821
>> > >> >> >> >>> > 0.31564200802459647
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: talk.politics.guns 203524.0148974065
>> > 531784.7805631821
>> > >> >> >> >>> > 0.3827187658170024
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: soc.religion.christian 163900.9258713857
>> > >> 531784.7805631821
>> > >> >> >> >>> > 0.308209132457322
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: sci.electronics 142854.1677345925
>> 531784.7805631821
>> > >> >> >> >>> > 0.26863154598614886
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.os.mswindows.misc 531784.7805631821
>> > >> 531784.7805631821
>> > >> >> >> 1.0
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: misc.forsale 143454.70176448982
531784.7805631821
>> > >> >> >> >>> > 0.26976082619845826
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: talk.religion.misc 139428.73484148504
>> > 531784.7805631821
>> > >> >> >> >>> > 0.2621901565024562
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: alt.atheism 139569.06867597546
531784.7805631821
>> > >> >> >> >>> 0.2624540486626301
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.windows.x 178029.10523376046
>> 531784.7805631821
>> > >> >> >> >>> > 0.33477660839638973
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: talk.politics.mideast 193075.00789450994
>> > >> 531784.7805631821
>> > >> >> >> >>> > 0.36306982627452317
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.sys.ibm.pc.hardware
138410.02049984262
>> > >> >> 531784.7805631821
>> > >> >> >> >>> > 0.2602745049477736
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.sys.mac.hardware 125200.9927438868
>> > >> 531784.7805631821
>> > >> >> >> >>> > 0.23543545682389364
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: sci.space 192437.0009266271
531784.7805631821
>> > >> >> >> 0.3618700797018455
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: rec.motorcycles 143142.20855440624
>> 531784.7805631821
>> > >> >> >> >>> > 0.26917319522159455
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: rec.autos 141800.97549909537
531784.7805631821
>> > >> >> >> 0.2666510601317365
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.graphics 166882.18654471825
531784.7805631821
>> > >> >> >> >>> > 0.3138152738556811
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: talk.politics.misc 165196.84193278523
>> > 531784.7805631821
>> > >> >> >> >>> > 0.3106460507535303
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: sci.med 192698.5183245711
531784.7805631821
>> > >> >> >> 0.36236185270382393
>> > >> >> >> >>> > Exception in thread "main"
>> > java.lang.IllegalArgumentException:
>> > >> >> Label
>> > >> >> >> not
>> > >> >> >> >>> > found: alt.atheism from
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
>> > >> >> >> >>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> > >> Method)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > >> >> >> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> > >> >> >> >>> > at
>> > >> >> >>
>> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> > >> >> >> >>> > at
>> > >> >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>> > >> >> >> >>> >
>> > >> >> >> >>> >
>> > >> >> >> >>> > Any help is great appreciated.
>> > >> >> >> >>> >
>> > >> >> >> >>> > Regards,
>> > >> >> >> >>>
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >>
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> >
>>
>
>
>
Vijay Santhanam
Software Engineer
http://au.linkedin.com/in/vijaysanthanam
0407525087
