mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Loek Cleophas <loek.cleop...@kalooga.com>
Subject Re: Problem running Twentynewsgroups classification example
Date Tue, 19 Jan 2010 07:32:47 GMT
Are you sure about it reading from local dir? Note that I pass -source  
hdfs to the TestClassifier, and that when I try to run it instead with  
a full local path i.e. as:

bin/hadoop jar ~/Downloads/mahout-0.2/examples/target/mahout- 
examples-0.2.job org.apache.mahout.classifier.bayes.TrainClassifier -i  
~/Code/My_Eclipse_Workspace/apache-mahout/examples/work/20news-18828- 
collapse -o 8newsmodel-0.2 -ng 3 -type bayes -source hdfs

I get the following exception, which seems to imply it is not reading  
the input from a local dir...:

Exception in thread "main"  
org.apache.hadoop.mapred.InvalidInputException: Input path does not  
exist: hdfs://localhost:9000/Users/loekcleophas/Code/ 
My_Eclipse_Workspace/apache-mahout/examples/work/20news-18828-collapse


On Jan 19, 2010, at 08:12, Robin Anil wrote:

> Is it reading the directory correctly ? Note, 8newsinput is read  
> from local
> dir.
>
>
>
>
>
> On Tue, Jan 19, 2010 at 12:39 PM, Loek Cleophas
> <loek.cleophas@kalooga.com>wrote:
>
>> Hi
>>
>> I've recently started working with Mahout. At first, I tried the  
>> trunk,
>> which I got to compile (both from within Eclipse with a Maven  
>> plugin, and
>> command line), but which apparently is in a state of flux regarding  
>> building
>> and running the examples (?).
>>
>> I tried running the Twentynewsgroups classification example, after  
>> copying
>> the relevant Maven file to the examples directory, as suggested on  
>> the
>> mailing list some time ago. I could get the example's data set from
>> wikipedia, could get it processed into input data located on the
>> single-node/local hdfs, and could get a model trained and output to  
>> that
>> hdfs. However, the example class TestClassifierto test with the  
>> trained
>> model didn't work for me, neither in mapreduce nor in sequential  
>> mode. In
>> the mapreduce case, and even with quite high JVM maximum heap sizes  
>> (I tried
>> 2048), I get heapspace out of memory errors / object configuration  
>> errors.
>> In the sequential case, I seemingly get 0 items classified, see  
>> output
>> below. (Note that I reduced the data set to just 8 instead of 20  
>> newsgroups,
>> thinking the data size might have something to do with the problem.)
>>
>> I also tried release 0.2, which I got to compile and for which I  
>> got the
>> example running more easily, but still with the same errors when  
>> testing
>> with the trained model. Any ideas what might be going wrong, or  
>> what I might
>> be doing wrong?
>>
>> Kind regards,
>> Loek Cleophas
>>
>>
>> Output of TestClassifier:
>>
>> bin/hadoop jar
>> ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job
>> org.apache.mahout.classifier.bayes.TestClassifier -m 8newsmodel-0.2  
>> -d
>> 8newsInput -ng 3 -type bayes -source hdfs -method sequential
>>
>> <... reading all the feature weights ...>
>>
>> 10/01/13 10:22:08 INFO io.SequenceFileModelReader: Read 1950000  
>> feature
>> weights
>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader:
>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer- 
>> weights/Sigma_k/part-00000
>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader:
>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer- 
>> weights/Sigma_kSigma_j/part-00000
>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader: 420716.6056712613
>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader:
>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer- 
>> thetaNormalizer/part-00000
>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader:
>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer- 
>> tfIdf/trainer-tfIdf/part-00000
>> comp.windows.x -4443829.798557077 7727496.583973498  
>> -0.5750671967650419
>> comp.graphics -3252365.124498224 7727496.583973498  
>> -0.4208821174044246
>> soc.religion.christian -5106741.34456479 7727496.583973498
>> -0.6608532645819548
>> alt.atheism -3447983.6168798 7727496.583973498 -0.44619671835646907
>> misc.forsale -2276588.3662840202 7727496.583973498  
>> -0.2946087832643716
>> comp.sys.mac.hardware -2445489.855812473 7727496.583973498
>> -0.31646598988918556
>> comp.os.ms-windows.misc -7727496.583973498 7727496.583973498 -1.0
>> comp.sys.ibm.pc.hardware -2687646.590023761 7727496.583973498
>> -0.3478030123750332
>> 10/01/13 10:23:17 INFO bayes.TestClassifier:
>> nCalls = 0;
>> sumTime = 0.0s;
>> minTime = 0.0ms;
>> maxTime = 0.0ms;
>> meanTime = 0.0ms;
>> stdDevTime = 0.0ms;
>> 10/01/13 10:23:18 INFO bayes.TestClassifier:
>> =======================================================
>> Summary
>> -------------------------------------------------------
>> Correctly Classified Instances          :          0             ?%
>> Incorrectly Classified Instances        :          0             ?%
>> Total Classified Instances              :          0
>>
>> =======================================================
>> Confusion Matrix
>> -------------------------------------------------------
>> a       b       c       d       e       f       g       h
>> <--Classified as
>> 0       0       0       0       0       0       0       0        |  0
>>    a     = comp.windows.x
>> 0       0       0       0       0       0       0       0        |  0
>>    b     = comp.graphics
>> 0       0       0       0       0       0       0       0        |  0
>>    c     = soc.religion.christian
>> 0       0       0       0       0       0       0       0        |  0
>>    d     = alt.atheism
>> 0       0       0       0       0       0       0       0        |  0
>>    e     = misc.forsale
>> 0       0       0       0       0       0       0       0        |  0
>>    f     = comp.sys.mac.hardware
>> 0       0       0       0       0       0       0       0        |  0
>>    g     = comp.os.ms-windows.misc
>> 0       0       0       0       0       0       0       0        |  0
>>    h     = comp.sys.ibm.pc.hardware
>> Default Category: unknown: 8
>>
>>


Mime
View raw message