spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "RJ Nowling (JIRA)" <>
Subject [jira] [Commented] (SPARK-4894) Add Bernoulli-variant of Naive Bayes
Date Wed, 14 Jan 2015 20:40:34 GMT


RJ Nowling commented on SPARK-4894:

Thanks [~lmcguire]!  I'll wait until next week in case you have time to put a patch together.

In the mean time, here were my thoughts for changes:
1. Add an optional `model` variable to the `NaiveBayes` object and class and `NaiveBayesModel`.
It would be a string with a default value of `Multinomial`.  For Bernoulli, we can use `Bernoulli`.

2.  In `NaiveBayesModel.predict`, we should compute and store `brzPi + brzTheta * testData.toBreeze`.
If `testData(i)` is 0, then `brzTheta * testData.toBreeze` will be 0. If Bernoulli is enabled,
we add `log(1 - exp(brzTheta)) * (1 - testData.toBreeze)` to account for the probabilities
for the 0-valued features.   (Breeze may not allow adding/subtracting scalars and vectors/matrices.)

In the current model, no term is added for rows of `testData` that have 0 entries.  In the
Bernoulli model, we would be adding a separate term for 0-valued features.

Here is the sklearn source for comparison:

Note that sklearn adds the neg prob to all features and subtracts it from features with 1-values.

[~mengxr], [~josephkb] Any thoughts or comments?

> Add Bernoulli-variant of Naive Bayes
> ------------------------------------
>                 Key: SPARK-4894
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 1.2.0
>            Reporter: RJ Nowling
>            Assignee: RJ Nowling
> MLlib only supports the multinomial-variant of Naive Bayes.  The Bernoulli version of
Naive Bayes is more useful for situations where the features are binary values.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message