mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philippe Lamarche" <philippe.lamar...@gmail.com>
Subject Problems with the Bayesian classifiers.
Date Sun, 20 Jul 2008 01:13:23 GMT
 Hi,

I have been working for a little while with Mahout and the Bayesian
classifier for a school project.

I am using the Enron email corpus and the UC Berkeley classified
emails (http://www.cs.cmu.edu/~enron/). I did a few tests and I can't
seem to make it work. I wonder if I am doing something wrong.

For example, I am getting correct prediction under 10%, with Bayes and
around 1% with CBayes. The problem seems to lie in the fact that all
instances of a class will be predicted to another class, or that they
will all be predicted to the class containing the more feature.

I also tested with the 20News corpus and I get similar result where
all instances of a class will be predicted to another class. (e.g. all
421 "rec.motorcycles" get predicted as "talk.politics.mideast").
Attached is two confusions matrix displaying results for bayes and
cbayes. Both used the same division in the training and testing set.

Am I doing something wrong?

Thanks,

Philippe Lamarche.

Mime
View raw message