mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Divya" <di...@k2associates.com.sg>
Subject FW: classification example doubts
Date Tue, 23 Nov 2010 10:05:59 GMT
Hi,

I am able to get the results when I run the test classifier.
Can view my results @ http://pastebin.com/D5ejTwEW

Steps I followed 
1)generate input data set with to train the classifier
$ bin/mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups  -p
examples/bin/work/20news-bydate/20news-bydat
e-train -o examples/bin/work/20news-bydate/bayes-train-input -a
org.apache.mahout.vectorizer.DefaultAnalyzer  -c UTF-8
2)Generate train input data set to test the classifier 
bin/mahout org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups  -p
examples/bin/work/20news-bydate/20news-bydate-test
-o examples/bin/work/20news-bydate/bayes-test-input -a
org.apache.mahout.vectorizer.DefaultAnalyzer  -c UTF-8
3)Train the classifier
bin/mahout trainclassifier -i
examples/bin/work/20news-bydate/bayes-train-input -o
examples/bin/work/20news-bydate/bayes-model
4)Test the classifier
bin/mahout testclassifier -m examples/bin/work/20news-bydate/bayes-model -d
examples/bin/work/20news-bydate/bayes-test-input

I have not passed any parameters except the required ones.

But when I pass the other  parameters like  -type bayes -ng 3 -source hdfs

I am not getting the expected results.
Can any one please explain me the reason behind it.

Thanks 
Regards,
Divya 


-----Original Message-----
From: Divya [mailto:divya@k2associates.com.sg] 
Sent: Tuesday, November 23, 2010 1:40 PM
To: 'user@mahout.apache.org'
Subject: RE: classification example doubts

I am following same steps 
But no success... 

-----Original Message-----
From: Sreejith S [mailto:srssreejith@gmail.com] 
Sent: Friday, November 19, 2010 4:00 PM
To: user@mahout.apache.org
Subject: Re: classification example doubts

step 1 : U can provide ur own sample data set using the prepare20news
example
 just provide ur input dir.This is to perform some normalization on each
file.This is a must

stpe2 : Train the classifier with the normalized list of files.
u get a model dir which contains the trained data set in hdfs.

step3 : Test the classifier
By using the trained model and sample input u can test the classifier

Regards
Sreejith


On Fri, Nov 19, 2010 at 1:15 PM, Divya <divya@k2associates.com.sg> wrote:

> for my first question u say we can put our own input documents in
directory
> that documents also should be of format similar to  bayes-train-input.
> If yes, then I generated my input data using PrepareTwentyNewsgroups.
> And used that as my input for testclassifier
> But didn't get expected results.
> As I observed it didn't read my files I my input directory
> I tried replacing one of the files of input directory with one of the
files
> of train-input directory
> Still same result.
> Why is it not reading my files?
>
> Results below :
>
> 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore:
> comp.sys.mac.hardware -121323.6282757108 547567.2698760114
> -0.2215684445551005
> 2
> 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.space
> -189203.04544769705 547567.2698760114 -0.3455338838834164
> 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: rec.motorcycles
> -138625.2628242977 547567.2698760114 -0.25316572127418674
> 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: rec.autos
> -136935.18434679657 547567.2698760114 -0.25007919917821886
> 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: comp.graphics
> -161979.38306986375 547567.2698760114 -0.29581640828631267
> 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore:
talk.politics.misc
> -159579.70032298338 547567.2698760114 -0.29143396455949216
> 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.med
> -183835.5334355675 547567.2698760114 -0.3357314133790253
> 10/11/19 10:45:12 INFO bayes.TestClassifier:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :          0             ?%
> Incorrectly Classified Instances        :          0             ?%
> Total Classified Instances              :          0
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d       e       f       g       h       i       j
> k       l       m       n       o       p       q     r
>        s       t       <--Classified as
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           a     = rec.sport.baseball
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           b     = sci.crypt
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           c     = rec.sport.hockey
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           d     = talk.politics.guns
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           e     = soc.religion.christian
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           f     = sci.electronics
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           g     = comp.os.ms-windows.misc
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           h     = misc.forsale
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           i     = talk.religion.misc
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           j     = alt.atheism
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           k     = comp.windows.x
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           l     = talk.politics.mideast
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           m     = comp.sys.ibm.pc.hardware
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           n     = comp.sys.mac.hardware
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           o     = sci.space
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           p     = rec.motorcycles
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           q     = rec.autos
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           r     = comp.graphics
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           s     = talk.politics.misc
> 0       0       0       0       0       0       0       0       0       0
> 0       0       0       0       0       0       0     0
>        0       0        |  0           t     = sci.med
> Default Category: unknown: 20
>
>
> 10/11/19 10:45:12 INFO driver.MahoutDriver: Program took 5485 ms
>
> Am I missing anything .
>
>
> Come to my second question, that means we are testing the classifier
> against
> our inputs itself.
> Still I didn't understand.
> What I understood about classification is we have set of documents which
> will act as model for classification of new documents in the system.
> Am I right?
> Doesn't Mahout works in same way ?
>
> Third question, yeah I am looking for Mahout's API for classification.
>
>
> @ Jaganadh - Thanks for clearing my doubts
>
> Regards,
> Divya
>
>
> -----Original Message-----
> From: JAGANADH G [mailto:jaganadhg@gmail.com]
> Sent: Friday, November 19, 2010 3:09 PM
> To: user@mahout.apache.org
> Subject: Re: classification example doubts
>
> >
> > 1)      I want to  know what should go in "bayes-test-input".
> >
> >
> After preparing the 20news-group data for training you can separate some
> documents for testing your classifier.
> These documents should go to "bayes-test-input".
>
> Or ven you can put a new set of documets in the directory .
>
>
> > 2)      If we take Wikipedia example
> > https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html
> >
> >
> >
> > To  trainclassifier We have used Wikipediainput to generate model .
> >
> > To test classifier again we used wikipediamodel as input and Wikipedia
> > input
> > as test documents directory.
> >
> > I didn't understand why are we doing so ?
> >
> >
>
> We are testing the classifier against the development set we used.
>
>
>
> > 3)      Last thing I want to know that when we use run testclassifier
> using
> > command line we can see the output.
> >
> > How can we make use of this output?
> >
>
>
> Are you looking for Mahout API usgae for classification ?
>
> --
> **********************************
> JAGANADH G
> http://jaganadhg.freeflux.net/blog
>
>


Mime
View raw message