Hello,
I'm training bayes classifier against this data (6 records):
target, words
T A A A
T A A A
T A A A
T A A B
T A A B
F A A B
with a command:
./mahout trainclassifier -i /mnt/hgfs/C/daniel/my_fav_data/test -o model
-type bayes -ng 1 -source hdfs
then I test this classifier against the same data with:
./mahout testclassifier -d /mnt/hgfs/C/daniel/my_fav_data/test -m model
-type bayes -ng 1 -source hdfs -method sequential -v
and I'm getting classification I cannot understand. All records are
classified as F, why is that?, shouldn't they be all classified as T?
12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 0 Line(30): T A A
A Expected Label: T Classified Label: F Correct: false
12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 1 Line(30): T A A
A Expected Label: T Classified Label: F Correct: false
12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 2 Line(30): T A A
A Expected Label: T Classified Label: F Correct: false
12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 3 Line(30): T A A
B Expected Label: T Classified Label: F Correct: false
12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 4 Line(30): T A A
B Expected Label: T Classified Label: F Correct: false
12/01/18 11:07:55 INFO bayes.TestClassifier: Line Number: 5 Line(30): F A A
B Expected Label: F Classified Label: F Correct: true
My reasoning (no smoothing applied):
Prior:
P(T) = 5/6
P(F) = 1/6
P(A/T) = 13/15
P(A/F) = 2/3
P(B/T) = 2/15
P(B/F) = 1/3
Then I calculate posterior probability, e.g. P(T|A,A,B) = 0.7717 - record
classified as T.
What is the reasoning behind classifying all records above as F?
Any help much appreciated.
PS. I was using mahout trunk from 16.01.2012.
Regards.
Daniel
--
Daniel Korzekwa
Software Engineer
priv: http://danmachine.com
blog: http://blog.danmachine.com
|