mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maurizio (JIRA)" <>
Subject [jira] Commented: (MAHOUT-9) Implement MapReduce BayesianClassifier
Date Sat, 28 Jun 2008 02:11:45 GMT


Maurizio commented on MAHOUT-9:

Hi Grant,
I'm developing something like your application and I found your code really interesting.
Probably I'm missing something, but I think that your bayesian approach doesn't work fine.
In the specific case, weightedFeatureProbability computes:
((weight * defaultProb) + (totalNumSeen * unweighted)) / (weight + totalNumSeen)
where  unweighted=numSeen/labelCount
again, where 
numSeen=# of time that feature has been seen within give label
labelCount=# of feature under label

If you observe the curve trend you realize that:
- terms never seen before are "heaver" than others.
- unweighted is a very small number , its contribution, in terms of probability, is insignificant.
Moreover, numerator grow more slowly than denominator in case of widespread term.

What do you think about?

> Implement MapReduce BayesianClassifier
> --------------------------------------
>                 Key: MAHOUT-9
>                 URL:
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>         Attachments: MAHOUT-9.patch, MAHOUT-9.patch, MAHOUT-9.patch, MAHOUT-9.patch,
> Implement a Bayesian classifier using M/R.
> I have a simple trainer done (not M/R) and will implement the classifier soon, then will
upgrade it to use Hadoop.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message