Space: Apache Lucene Mahout (http://cwiki.apache.org/confluence/display/MAHOUT)
Page: Bayesian (http://cwiki.apache.org/confluence/display/MAHOUT/Bayesian)
Edited by Robin Anil:

h1. Intro
Mahout currently has two implementations of Bayesian classifiers. One is the traditional
Naive Bayes approach, and the other is called Complementary Naive Bayes.
h1. Implementations
[NaiveBayes] ([MAHOUT9http://issues.apache.org/jira/browse/MAHOUT9])
[Complementary Naive Bayes] ([MAHOUT60http://issues.apache.org/jira/browse/MAHOUT60])
The Naive Bayes implementations in Mahout follow the paper [http://people.csail.mit.edu/jrennie/papers/icml03nb.pdf]
Before we get to the actual algorithm lets discuss the terminology
{noformat}{noformat}
Given
j = 0 to N features
k = 0 to L labels
in an input set of classified documents.
{noformat}{noformat}
{noformat}
{noformat}
Normalized Frequency for a term(feature) in a document is calculated by dividing the term
frequency by the root mean square of terms frequencies in that document
Weight Normalized Tf for a given feature in a given label = sum of Normalized Frequency of
the feature across all the documents in the label.
Weight Normalized TfIdf for a given feature in a label is the Tfidf calculated using standard
idf multiplied by the Weight Normalized Tf
{noformat}{noformat}
Once Weight Normalized Tfidf(WNTfidf) is calculated, the final weight matrix for Bayes
and Cbayes are calculated as follows
We calculate the sum of WNTfidf for all the features in a label called as Sigma_k or sumLabelWeight
For Bayes
{noformat}{noformat}
Weight = Log [ ( WNTfIdf + alpha_i ) / ( Sigma_k + N ) ]
{noformat}{noformat}
For CBayes
We calculate the Sum of WNTfIdf across all labels for a given feature. We call this sumFeatureWeight
of Sigma_j
Also we sum the entire WNTfIdf weights for all feature,label pair in the train set. Call
this Sigma_jSigma_k
Final Weight is calculated as
{noformat}{noformat}
Weight = Log [ ( Sigma_j  WNTfIdf + alpha_i ) / ( Sigma_jSigma_k  Sigma_k + N ) ]
{noformat}{noformat}
h1. Examples
In Mahout's example code, there are two samples that can be used:
# [WikipediaBayesExample]  Classify Wikipedia data.
# [TwentyNewsGroups]  Classify the classic Twenty Newsgroups data.
Change your notification preferences: http://cwiki.apache.org/confluence/users/viewnotifications.action
