mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robin Anil (JIRA)" <>
Subject [jira] Commented: (MAHOUT-220) Mahout Bayes Code cleanup
Date Tue, 29 Dec 2009 20:31:29 GMT


Robin Anil commented on MAHOUT-220:

The current Bayes implementation is an island. if you skim through the training mechanism.
Its a very optimised. (with least map/reduces) and the kind of information I store in hbase
and in memory is very specific to that paper. 

First there is the weight, which is a matrix of feature as row and label as column and cell
as the weight.
Secondly, there is sum of cols and rows. put along with the weight matrix. 
Then there are special rows containing, the theta normalizer and alpha smoothing value etc.

 You can see its not really doing bayes rule. it is reproducing the math of CBayes paper.
 So I see noway of it direcly using the sgd model. 

We could have a Bayes Algo implementation specfic to the model you are training.  If thats

> Mahout Bayes Code cleanup
> -------------------------
>                 Key: MAHOUT-220
>                 URL:
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.3
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>             Fix For: 0.3
>         Attachments: MAHOUT-BAYES.patch, MAHOUT-BAYES.patch
> Following isabel's checkstyle, I am adding a whole slew of code cleanup with the following
> 1.  Line length used is 120 instead of 80. 
> 2.  static final log is kept as is. not LOG. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message