mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jyoti Gupta <jyotigupta.i...@gmail.com>
Subject Re: Naive Bayes score comparison across multiple classifiers
Date Wed, 25 May 2011 13:29:52 GMT
I was using the latest 0.6-Snapshot version . It worked fine with the 0.4
version but still gave the same accuracy of 48%.
It might be because my categories have some features in common means a
sample can belong to both the categories at the same time.

But regarding my original question, are the scores from different
classifiers comparable?

On Wed, May 25, 2011 at 6:41 PM, Robin Anil <robin.anil@gmail.com> wrote:

> Might have to check the input format. and the model generated. Cannot tel
> otherwise.  Does it work when you change method to mapreduce from
> sequential
> during classification?
>
>
> On Wed, May 25, 2011 at 4:28 PM, Jyoti Gupta <jyotigupta.iitd@gmail.com
> >wrote:
>
> > Its a plain text classification and I used 1K samples for each category.
> >
> > Btw I again tried with cbayes algorithm and on testing, all the samples
> got
> > classified as Unknown category.
> >
> > The command I used to train is :
> > ./bin/mahout trainclassifier -i <path to the train directory>
> > -o <path to the model directory>  -type cbayes -ng 1 -source hdfs
> >
> > To Test the Classifier
> > ./bin/mahout testclassifier -m <path to the model directory>
> > -d <path to the test directory> -type cbayes -ng 1 -source hdfs -method
> > sequential
> >
> > Is anything wrong with what I am doing?
> >
> > On Wed, May 25, 2011 at 2:58 PM, Robin Anil <robin.anil@gmail.com>
> wrote:
> >
> > > Depends on size of data. The NB implementation works well for a lot of
> > data
> > > and large records(especially text). if you are trying other type of
> data
> > > like attribute -enums and dense features, It might not work as well.
> > >
> > > Robin
> > >
> > > On Wed, May 25, 2011 at 1:44 PM, Jyoti Gupta <
> jyotigupta.iitd@gmail.com
> > > >wrote:
> > >
> > > > I have tried that previously but it was not giving good accuracy. Got
> > > > around
> > > > 50 % accuracy for 14 categories.
> > > >
> > > > On Wed, May 25, 2011 at 1:17 PM, Ted Dunning <ted.dunning@gmail.com>
> > > > wrote:
> > > >
> > > > > Why not just use the multi-class capability of the Naive Bayes
> > > > > categorizers?
> > > > >
> > > > > On Wed, May 25, 2011 at 12:13 AM, Jyoti Gupta <
> > > jyotigupta.iitd@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am using NaiveBayes Classifier to classify my input into one
of
> N
> > > > > > categories. I am creating N binary classifiers using One vs
All
> > > > approach.
> > > > > > The train document size is different for each classifier and
the
> > > > > > probability
> > > > > > of each category is same (1/N).
> > > > > >
> > > > > > Can I compare these scores across these classifiers to get a
> final
> > > > > category
> > > > > > as output? Or can you suggest any way to normalize them?
> > > > > >
> > > > > > Also, while testing I found that the label returned by the
> > > > > > ClassifierContext.classify method has lower score value than
the
> > > other
> > > > > > label.
> > > > > > e.g. there are two categories... X and Non-X
> > > > > > classifier.classify(input) returns (X,score1)
> > > > > > and classifier.classify(input,2) returns a list [{X,score1},
> > > > > > {Non-X,score2}]
> > > > > > Here I found that score1 < score2. I did not go into the
> > > implementation
> > > > > but
> > > > > > I thought that greater score means greater probability.
> > > > > >
> > > > > > Thanks,
> > > > > > Jyoti
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > "Be the change you want to see"
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > "Be the change you want to see"
> > > >
> > >
> >
> >
> >
> > --
> > "Be the change you want to see"
> >
>



-- 
"Be the change you want to see"

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message