mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Korzekwa <daniel.korze...@gmail.com>
Subject Bayes classification - strange results
Date Wed, 18 Jan 2012 19:22:06 GMT
Hello,

I'm training bayes classifier against this data (6 records):

target, words
T A A A
T A A A
T A A A
T A A B
T A A B
F A A B

with a command:
 ./mahout trainclassifier -i /mnt/hgfs/C/daniel/my_fav_data/test -o model
-type bayes -ng 1 -source hdfs

then I test this classifier against the same data with:
./mahout testclassifier -d /mnt/hgfs/C/daniel/my_fav_data/test -m model
-type bayes -ng 1 -source hdfs -method sequential -v

 and I'm getting classification I cannot understand. All records are
classified as F, why is that?, shouldn't they be all classified as T?
12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 0 Line(30): T A A
A Expected Label: T Classified Label: F Correct: false
12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 1 Line(30): T A A
A Expected Label: T Classified Label: F Correct: false
12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 2 Line(30): T A A
A Expected Label: T Classified Label: F Correct: false
12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 3 Line(30): T A A
B Expected Label: T Classified Label: F Correct: false
12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 4 Line(30): T A A
B Expected Label: T Classified Label: F Correct: false
12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 5 Line(30): F A A
B Expected Label: F Classified Label: F Correct: true

My reasoning (no smoothing applied):
Prior:
P(T) = 5/6
P(F) = 1/6

P(A/T) = 13/15
P(A/F) = 2/3

P(B/T) = 2/15
P(B/F) = 1/3

Then I calculate posterior probability, e.g. P(T|A,A,B) = 0.7717 - record
classified as T.

What is the reasoning behind classifying all records above as F?

Any help much appreciated.

PS. I was using mahout trunk from 16.01.2012.

Regards.
Daniel

-- 
Daniel Korzekwa
Software Engineer
priv: http://danmachine.com
blog: http://blog.danmachine.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message