mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijay Santhanam <vijay.santha...@gmail.com>
Subject Re: 20news
Date Mon, 04 Jul 2011 11:16:38 GMT
Hi,

I got debugger running w/ eclipse so I could watch what was happening under
the hood.

Here's the exception again
Exception in thread "main" java.lang.IllegalArgumentException: Label not
found: alt.atheism from
 at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at
org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
 at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
 at
org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
at
org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
 at
org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
at
org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
 at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)

Notice the "Label not found: alt.atheism\tfrom"

That's an invalid label in the confusion matrix. I think it SHOULD be just
alt.atheism. I'm not sure how the \tfrom is getting in there, but it is.
Perhaps it has something to do with the way my test data was formatted.

I'll keep digging....

Thanks,
Vijay



On Mon, Jul 4, 2011 at 8:52 PM, Vijay Santhanam
<vijay.santhanam@gmail.com>wrote:

> Hi Robin,
>
> The console dump was a too large for pastebin, so I uploaded it here --
> http://dl.dropbox.com/u/7881451/build-20news-bayes-console-output.txt
>
> I performed a fresh checkout only hours ago, and I used script
> examples/bin/build-20news-bayes.sh
> I've opted to avoid hadoop, but from what I can tell the model was created
> with success.
>
>
> Thanks,
> Vijay
>
>
> On Mon, Jul 4, 2011 at 8:28 PM, Robin Anil <robin.anil@gmail.com> wrote:
>
>> Can you send me the console dump
>> Command line + Log written by the program and put it on say pastebin
>>
>> Robin
>>
>> On Mon, Jul 4, 2011 at 3:48 PM, Vijay Santhanam
>> <vijay.santhanam@gmail.com>wrote:
>>
>> > I tried deleting all the folders from the test and train data except for
>> > alt.atheism, but I get the identical error.
>> >
>> > I might try debugging the problem in eclipse rather than from
>> commandline,
>> > but Eclipse doesn't quite want to work either.
>> >
>> >
>> > On Mon, Jul 4, 2011 at 8:02 PM, Vijay Santhanam
>> > <vijay.santhanam@gmail.com>wrote:
>> >
>> > > Thanks anyway Sergey. Could you perhaps upload your bayes-model folder
>> so
>> > I
>> > > could try that out?
>> > >
>> > >
>> > >
>> > > On Mon, Jul 4, 2011 at 7:57 PM, Sergey Bartunov <sbos.net@gmail.com
>> > >wrote:
>> > >
>> > >> Well, that's strange. Sorry, I can't help you at the moment, maybe
>> > >> someone else in the mailing list could.
>> > >>
>> > >> On 4 July 2011 13:49, Vijay Santhanam <vijay.santhanam@gmail.com>
>> > wrote:
>> > >> > Hi Sergey,
>> > >> >
>> > >> > Yes, there were no errors.
>> > >> >
>> > >> > And all the model data seems to have been populated into
>> bayes-model
>> > >> folder.
>> > >> > Also, each main folder in bayes-model has a _SUCESS file.
>> > >> >
>> > >> > See the tarball of my trained model here,
>> > >> > http://dl.dropbox.com/u/7881451/bayes-model.tar.gz
>> > >> > Please compare it to your trained model if possible, I would like
>> to
>> > >> know if
>> > >> > it's different in any way.
>> > >> >
>> > >> > Perhaps it's corrupted in someway.
>> > >> >
>> > >> > Thanks,
>> > >> > Vijay
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sbos.net@
>> gmail.com>
>> > >> wrote:
>> > >> >
>> > >> >> Stop, did you _train_ the classifier successfully before running
>> the
>> > >> >> _test_?
>> > >> >>
>> > >> >> On 4 July 2011 13:30, Vijay Santhanam <vijay.santhanam@gmail.com>
>> > >> wrote:
>> > >> >> > Hi Sergey,
>> > >> >> >
>> > >> >> > I've tried using both the sh script file and following
the
>> > >> instructions
>> > >> >> at
>> > >> >> > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
- like
>> you
>> > >> >> suggested.
>> > >> >> > Both return the same results.
>> > >> >> >
>> > >> >> > I've uploaded my bayes-test-input folder to dropbox,
the first
>> file
>> > >> is
>> > >> >> > here...
>> > >> >> >
>> http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
>> > >> >> >
>> > >> >> > Thanks,
>> > >> >> > Vijay
>> > >> >> >
>> > >> >> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sbos.net@
>> > gmail.com>
>> > >> >> wrote:
>> > >> >> >
>> > >> >> >> Paste somewhere your  bayes-test-input file.
>> > >> >> >>
>> > >> >> >> On 4 July 2011 13:20, Sergey Bartunov <sbos.net@gmail.com>
>> wrote:
>> > >> >> >> > Yes, I worked WITH hadoop, but there should
be no difference.
>> > >> >> >> >
>> > >> >> >> > Why do you use examples/bin/build/20news-bayes.sh
instead of
>> > >> direct
>> > >> >> >> > running bin/mahout? Is it the same?
>> > >> >> >> >
>> > >> >> >> > On 4 July 2011 13:12, Vijay Santhanam <
>> > vijay.santhanam@gmail.com>
>> > >> >> wrote:
>> > >> >> >> >> Thanks Sergey,
>> > >> >> >> >>
>> > >> >> >> >> I'm still receiving the same error after
following those
>> steps.
>> > >> >> >> >> I've chosen not to use hadoop - does yours
work WITH hadoop?
>> > >> >> >> >>
>> > >> >> >> >> A few bits of info that might be relevant.
>> > >> >> >> >>
>> > >> >> >> >> My examples/bin/work folder contains the
expected folders
>> from
>> > >> test
>> > >> >> data
>> > >> >> >> >> preparation and training...
>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar
 2003
>> > 20news-bydate-test
>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar
 2003
>> > >> 20news-bydate-train
>> > >> >> >> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul
19:03 bayes-model
>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul
18:20
>> bayes-test-input
>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul
17:49
>> bayes-train-input
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> I appreciate your help, do you have any
other suggestions?
>> > >> >> >> >>
>> > >> >> >> >> Regards,
>> > >> >> >> >> Vijay
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov
<sbos.net@
>> > >> gmail.com>
>> > >> >> >> wrote:
>> > >> >> >> >>
>> > >> >> >> >>> When I started with Mahout I had the
same errors. In my
>> case,
>> > I
>> > >> just
>> > >> >> >> >>> didn't run PrepareTwentyNewsgroups.
You may try to
>> accurately
>> > >> repeat
>> > >> >> >> >>> all steps from
>> > >> >> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
>> > >> >> >> >>>
>> > >> >> >> >>> On 4 July 2011 12:52, Vijay Santhanam
<
>> > >> vijay.santhanam@gmail.com>
>> > >> >> >> wrote:
>> > >> >> >> >>> > Hi All,
>> > >> >> >> >>> >
>> > >> >> >> >>> > I'm new to Mahout and I'm interested
in experimenting
>> with
>> > >> it's
>> > >> >> >> >>> classifiers.
>> > >> >> >> >>> >
>> > >> >> >> >>> > Right now, I'm just trying to get
up and running with the
>> > >> demo's
>> > >> >> and
>> > >> >> >> >>> > examples.
>> > >> >> >> >>> >
>> > >> >> >> >>> > After checking out the mahout trunk,
I've tried running
>> the
>> > >> >> >> >>> classification
>> > >> >> >> >>> > example 20news, but after running
the
>> > >> >> >> >>> ./examples/bin/build/20news-bayes.sh
>> > >> >> >> >>> > script I get the following error
during the
>> classification
>> > >> phase.
>> > >> >> >> >>> >
>> > >> >> >> >>> > Does anyone else get the same thing?
Or have any
>> > >> recommendations
>> > >> >> >> about
>> > >> >> >> >>> how
>> > >> >> >> >>> > to fix it?
>> > >> >> >> >>> > I'd just like to get a sample classifier
working before I
>> > >> embark
>> > >> >> on
>> > >> >> >> my
>> > >> >> >> >>> own
>> > >> >> >> >>> > classification journey.
>> > >> >> >> >>> >
>> > >> >> >> >>> >
>> > >> >> >> >>> > INFO: Loading model from:
>> > >> >> >> >>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
>> > >> >> >> >>> classifierType=bayes,
>> > >> >> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1,
verbose=false,
>> > >> >> >> encoding=UTF-8,
>> > >> >> >> >>> > defaultCat=unknown,
>> > >> >> >> >>> >
>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
>> > >> >> >> >>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: Testing Bayes Classifier
>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: Read 50000 feature weights
>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: Read 100000 feature weights
>> > >> >> >> >>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: 193370.88331085522
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: rec.sport.baseball -129829.34738930278
>> > 531784.7805631821
>> > >> >> >> >>> > -0.2441388925268003
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: sci.crypt -193023.42370049533
531784.7805631821
>> > >> >> >> -0.3629728242618669
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: rec.sport.hockey -167853.6159738822
>> 531784.7805631821
>> > >> >> >> >>> > -0.31564200802459647
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: talk.politics.guns -203524.0148974065
>> > 531784.7805631821
>> > >> >> >> >>> > -0.3827187658170024
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: soc.religion.christian -163900.9258713857
>> > >> 531784.7805631821
>> > >> >> >> >>> > -0.308209132457322
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: sci.electronics -142854.1677345925
>> 531784.7805631821
>> > >> >> >> >>> > -0.26863154598614886
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821
>> > >> 531784.7805631821
>> > >> >> >> -1.0
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: misc.forsale -143454.70176448982
531784.7805631821
>> > >> >> >> >>> > -0.26976082619845826
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: talk.religion.misc -139428.73484148504
>> > 531784.7805631821
>> > >> >> >> >>> > -0.2621901565024562
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: alt.atheism -139569.06867597546
531784.7805631821
>> > >> >> >> >>> -0.2624540486626301
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.windows.x -178029.10523376046
>> 531784.7805631821
>> > >> >> >> >>> > -0.33477660839638973
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: talk.politics.mideast -193075.00789450994
>> > >> 531784.7805631821
>> > >> >> >> >>> > -0.36306982627452317
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.sys.ibm.pc.hardware
-138410.02049984262
>> > >> >> 531784.7805631821
>> > >> >> >> >>> > -0.2602745049477736
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.sys.mac.hardware -125200.9927438868
>> > >> 531784.7805631821
>> > >> >> >> >>> > -0.23543545682389364
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: sci.space -192437.0009266271
531784.7805631821
>> > >> >> >> -0.3618700797018455
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: rec.motorcycles -143142.20855440624
>> 531784.7805631821
>> > >> >> >> >>> > -0.26917319522159455
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: rec.autos -141800.97549909537
531784.7805631821
>> > >> >> >> -0.2666510601317365
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.graphics -166882.18654471825
531784.7805631821
>> > >> >> >> >>> > -0.3138152738556811
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: talk.politics.misc -165196.84193278523
>> > 531784.7805631821
>> > >> >> >> >>> > -0.3106460507535303
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: sci.med -192698.5183245711
531784.7805631821
>> > >> >> >> -0.36236185270382393
>> > >> >> >> >>> > Exception in thread "main"
>> > java.lang.IllegalArgumentException:
>> > >> >> Label
>> > >> >> >> not
>> > >> >> >> >>> > found: alt.atheism from
>> > >> >> >> >>> >  at
>> > >> >> >> >>> >
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>> > >> >> >> >>> >  at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>> > >> >> >> >>> >  at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>> > >> >> >> >>> >  at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
>> > >> >> >> >>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> > >> Method)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > >> >> >> >>> >  at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > >> >> >> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> > >> >> >> >>> >  at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> > >> >> >> >>> > at
>> > >> >> >>
>> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> > >> >> >> >>> >  at
>> > >> >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>> > >> >> >> >>> >
>> > >> >> >> >>> >
>> > >> >> >> >>> > Any help is great appreciated.
>> > >> >> >> >>> >
>> > >> >> >> >>> > Regards,
>> > >> >> >> >>> > --
>> > >> >> >> >>> >  Vijay Santhanam
>> > >> >> >> >>> >  Software Engineer
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> --
>> > >> >> >> >>  Vijay Santhanam
>> > >> >> >> >>  Software Engineer
>> > >> >> >> >>  http://au.linkedin.com/in/vijaysanthanam
>> > >> >> >> >>  0407525087
>> > >> >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > --
>> > >> >> >  Vijay Santhanam
>> > >> >> >  Software Engineer
>> > >> >> >  http://au.linkedin.com/in/vijaysanthanam
>> > >> >> >  0407525087
>> > >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> >  Vijay Santhanam
>> > >> >  Software Engineer
>> > >> >  http://au.linkedin.com/in/vijaysanthanam
>> > >> >  0407525087
>> > >> >
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > >  Vijay Santhanam
>> > >  Software Engineer
>> > >  http://au.linkedin.com/in/vijaysanthanam
>> > >  0407525087
>> > >
>> >
>> >
>> >
>> > --
>> >  Vijay Santhanam
>> >  Software Engineer
>> >  http://au.linkedin.com/in/vijaysanthanam
>> >  0407525087
>> >
>>
>
>
>
> --
>  Vijay Santhanam
>  Software Engineer
>  http://au.linkedin.com/in/vijaysanthanam
>  0407525087
>



-- 
 Vijay Santhanam
 Software Engineer
 http://au.linkedin.com/in/vijaysanthanam
 0407525087

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message