mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maurizio (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-9) Implement MapReduce BayesianClassifier
Date Sat, 28 Jun 2008 02:11:45 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608964#action_12608964
] 

Maurizio commented on MAHOUT-9:
-------------------------------

Hi Grant,
I'm developing something like your application and I found your code really interesting.
Probably I'm missing something, but I think that your bayesian approach doesn't work fine.
In the specific case, weightedFeatureProbability computes:
 
((weight * defaultProb) + (totalNumSeen * unweighted)) / (weight + totalNumSeen)
where  unweighted=numSeen/labelCount
again, where 
numSeen=# of time that feature has been seen within give label
and
labelCount=# of feature under label

If you observe the curve trend you realize that:
- terms never seen before are "heaver" than others.
- unweighted is a very small number , its contribution, in terms of probability, is insignificant.
Moreover, numerator grow more slowly than denominator in case of widespread term.

What do you think about?
 


> Implement MapReduce BayesianClassifier
> --------------------------------------
>
>                 Key: MAHOUT-9
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-9
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-9.patch, MAHOUT-9.patch, MAHOUT-9.patch, MAHOUT-9.patch,
MAHOUT-9.patch
>
>
> Implement a Bayesian classifier using M/R.
> I have a simple trainer done (not M/R) and will implement the classifier soon, then will
upgrade it to use Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message