mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Divya" <di...@k2associates.com.sg>
Subject RE: classification example doubts
Date Fri, 19 Nov 2010 07:45:48 GMT
for my first question u say we can put our own input documents in directory 
that documents also should be of format similar to  bayes-train-input.
If yes, then I generated my input data using PrepareTwentyNewsgroups.
And used that as my input for testclassifier 
But didn't get expected results.
As I observed it didn't read my files I my input directory
I tried replacing one of the files of input directory with one of the files
of train-input directory 
Still same result.
Why is it not reading my files?

Results below :

10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore:
comp.sys.mac.hardware -121323.6282757108 547567.2698760114
-0.2215684445551005
2
10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.space
-189203.04544769705 547567.2698760114 -0.3455338838834164
10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: rec.motorcycles
-138625.2628242977 547567.2698760114 -0.25316572127418674
10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: rec.autos
-136935.18434679657 547567.2698760114 -0.25007919917821886
10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: comp.graphics
-161979.38306986375 547567.2698760114 -0.29581640828631267
10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: talk.politics.misc
-159579.70032298338 547567.2698760114 -0.29143396455949216
10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.med
-183835.5334355675 547567.2698760114 -0.3357314133790253
10/11/19 10:45:12 INFO bayes.TestClassifier:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :          0             ?%
Incorrectly Classified Instances        :          0             ?%
Total Classified Instances              :          0

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       e       f       g       h       i       j
k       l       m       n       o       p       q     r
        s       t       <--Classified as
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           a     = rec.sport.baseball
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           b     = sci.crypt
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           c     = rec.sport.hockey
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           d     = talk.politics.guns
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           e     = soc.religion.christian
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           f     = sci.electronics
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           g     = comp.os.ms-windows.misc
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           h     = misc.forsale
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           i     = talk.religion.misc
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           j     = alt.atheism
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           k     = comp.windows.x
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           l     = talk.politics.mideast
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           m     = comp.sys.ibm.pc.hardware
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           n     = comp.sys.mac.hardware
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           o     = sci.space
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           p     = rec.motorcycles
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           q     = rec.autos
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           r     = comp.graphics
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           s     = talk.politics.misc
0       0       0       0       0       0       0       0       0       0
0       0       0       0       0       0       0     0
        0       0        |  0           t     = sci.med
Default Category: unknown: 20


10/11/19 10:45:12 INFO driver.MahoutDriver: Program took 5485 ms

Am I missing anything .


Come to my second question, that means we are testing the classifier against
our inputs itself.
Still I didn't understand.
What I understood about classification is we have set of documents which
will act as model for classification of new documents in the system.
Am I right?
Doesn't Mahout works in same way ?

Third question, yeah I am looking for Mahout's API for classification.


@ Jaganadh - Thanks for clearing my doubts  

Regards,
Divya 

 
-----Original Message-----
From: JAGANADH G [mailto:jaganadhg@gmail.com] 
Sent: Friday, November 19, 2010 3:09 PM
To: user@mahout.apache.org
Subject: Re: classification example doubts

>
> 1)      I want to  know what should go in "bayes-test-input".
>
>
After preparing the 20news-group data for training you can separate some
documents for testing your classifier.
These documents should go to "bayes-test-input".

Or ven you can put a new set of documets in the directory .


> 2)      If we take Wikipedia example
> https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html
>
>
>
> To  trainclassifier We have used Wikipediainput to generate model .
>
> To test classifier again we used wikipediamodel as input and Wikipedia
> input
> as test documents directory.
>
> I didn't understand why are we doing so ?
>
>

We are testing the classifier against the development set we used.



> 3)      Last thing I want to know that when we use run testclassifier
using
> command line we can see the output.
>
> How can we make use of this output?
>


Are you looking for Mahout API usgae for classification ?

-- 
**********************************
JAGANADH G
http://jaganadhg.freeflux.net/blog


Mime
View raw message