mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Palumbo (JIRA)" <>
Subject [jira] [Commented] (MAHOUT-1369) Why is theta normalization for naive bayes classification commented out?
Date Sun, 30 Mar 2014 22:08:15 GMT


Andrew Palumbo commented on MAHOUT-1369:

>From what can see, looking into this a bit more today, the original paper (Rennie et al.)
Focuses on 4 Naive Bayes Models.  Multinomial Naive Bayes (MNB), Complement Naive Bayes (CNB),
Weight Normalized Complment Naive Bayes (WCNB) and Transformed Weight Normalized Complement
Naive Bayes (TWNCB).  The current mahout NB implementation so far seems to be only for: 

MNB (trainnb/testnb ...)
CNB (trainnb/testnb -c ...)

Theta normalization is only called for WCNB and TWNCB. So it being commented out doesn't effect
MNB or CNB.  

It seems that the call to the thetaSummer job is commented out because the weight normalization/transformation
implementation is incomplete.  As far as I can tell MNB and CNB classifiers seem to be calculating
weights correctly.

If the goal is to stick to the Rennie implementations, I think that once the thetaSummer job
(or whatever turns out to be the problem with weight normalization/transformation) is corrected/completed,
it should only be called when a separate option is supplied- something like: 

trainnb/testnb -wcnb (WCNB) 
trainnb/testnb -twcnb (TWCNB)

I also just noticed that the mahout website says that the TWCNB implementation is what's being
called in mahout's complementary naive bayes:

however i believe that the CNB implementation is what's really being called here.

I think that there is more going on here as well- the weight summer may need to be called
in a different order. I will continue to look into this over this week.


> Why is theta normalization for naive bayes classification commented out?
> ------------------------------------------------------------------------
>                 Key: MAHOUT-1369
>                 URL:
>             Project: Mahout
>          Issue Type: Question
>          Components: Classification
>    Affects Versions: 0.7, 0.8, 0.9
>         Environment: mahout 0.8
>            Reporter: utku yaman
>            Priority: Minor
>              Labels: features
>             Fix For: 1.0
> TrainNaiveBayesJob line 155:158
> and
> BayesUtils line 86:93
> are commented out and these lines are for theta normalization for bayes.
> what is the problem with the code and is there a plan for correcting these methods.

This message was sent by Atlassian JIRA

View raw message