mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [CONF] Apache Lucene Mahout > Bayesian
Date Wed, 22 Jul 2009 17:28:00 GMT
Space: Apache Lucene Mahout (
Page: Bayesian (

Edited by Robin Anil:
h1. Intro

Mahout currently has two implementations of Bayesian classifiers.  One is the traditional
Naive Bayes approach, and the other is called Complementary Naive Bayes.

h1. Implementations

[NaiveBayes] ([MAHOUT-9|])

[Complementary Naive Bayes] ([MAHOUT-60|])

The Naive Bayes implementations in Mahout follow the paper []
Before we get to the actual algorithm lets discuss the terminology

j = 0 to N features 
k = 0 to L labels
in an input set of classified documents.

Normalized Frequency for a term(feature) in a document is calculated by dividing the term
frequency by the root mean square of terms frequencies in that document
Weight Normalized Tf for a given feature in a given label = sum of Normalized Frequency of
the feature across all the documents in the label. 
Weight Normalized Tf-Idf for a given feature in a label is the Tf-idf calculated using standard
idf multiplied by the Weight Normalized Tf

Once Weight Normalized Tf-idf(W-N-Tf-idf) is calculated, the final weight matrix for Bayes
and Cbayes are calculated as follows

We calculate the sum of W-N-Tf-idf for all the features in a label called as Sigma_k or sumLabelWeight

For Bayes

Weight = Log [ ( W-N-Tf-Idf + alpha_i ) / ( Sigma_k + N  ) ]

For CBayes

We calculate the Sum of W-N-Tf-Idf across all labels for a given feature. We call this sumFeatureWeight
of Sigma_j
Also we sum the entire W-N-Tf-Idf weights for all feature,label pair in the train set. Call
this Sigma_jSigma_k

Final Weight is calculated as

Weight = Log [ ( Sigma_j - W-N-Tf-Idf + alpha_i ) / ( Sigma_jSigma_k - Sigma_k + N  ) ]

h1. Examples

In Mahout's example code, there are two samples that can be used:

# [WikipediaBayesExample] - Classify Wikipedia data.

# [TwentyNewsGroups] - Classify the classic Twenty Newsgroups data.

Change your notification preferences:

View raw message