mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: [jira] [Commented] (MAHOUT-746) Refactoring of the parallel Naive Bayes implementation in org.apache.mahout.classifier.naivebayes
Date Tue, 28 Jun 2011 22:54:35 GMT
That paper answered my questions, thank you Ted.

I'll rework the patch a little to use variable names more consistent 
with the paper and I think my colleague was right when he suspected a 
tiny bug that only occurs when one uses a smoothing parameter different 
from one.


On 29.06.2011 00:03, Ted Dunning wrote
> Hmmm... not sure.  I thought they were all the same.  It is possible
> there is a left-over implementation.
>
> Robin?  Care to comment?
>
> On Tue, Jun 28, 2011 at 3:01 PM, Sebastian Schelter <ssc@apache.org
> <mailto:ssc@apache.org>> wrote:
>
>     Is org.apache.mahout.classifier.__naivebayes also based on that one?
>     I thought it was only relevant for org.apache.mahout.classifier.__bayes?
>
>
>     On 28.06.2011 23:58, Ted Dunning wrote:
>
>         See here:
>         http://citeseerx.ist.psu.edu/__viewdoc/summary?doi=10.1.1.13.__8572&rank=1
>         <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1>
>
>         On Tue, Jun 28, 2011 at 2:43 PM, Sebastian Schelter (JIRA)
>         <jira@apache.org <mailto:jira@apache.org>>wrote:
>
>
>                 [
>             https://issues.apache.org/__jira/browse/MAHOUT-746?page=__com.atlassian.jira.plugin.__system.issuetabpanels:comment-__tabpanel&focusedCommentId=__13056805#comment-13056805
>             <https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805>]
>
>             Sebastian Schelter commented on MAHOUT-746:
>             ------------------------------__-------------
>
>             Thank you very much, Sean.
>
>             I wonder whether there is some article/paper that describes
>             this particular
>             approach of implementing Naive Bayes? A colleague of mine
>             with a much deeper
>             statistics background and me took a look at the details of
>             the computation
>             today and we were left with some open questions.
>
>                 Refactoring of the parallel Naive Bayes implementation in
>
>             org.apache.mahout.classifier.__naivebayes
>
>
>             ------------------------------__------------------------------__------------------------------__-------
>
>
>                                  Key: MAHOUT-746
>                                  URL:
>                 https://issues.apache.org/__jira/browse/MAHOUT-746
>                 <https://issues.apache.org/jira/browse/MAHOUT-746>
>                              Project: Mahout
>                           Issue Type: Improvement
>                           Components: Classification
>                     Affects Versions: 0.6
>                             Reporter: Sebastian Schelter
>                             Assignee: Sebastian Schelter
>                              Fix For: 0.6
>
>                          Attachments: MAHOUT-746.patch
>
>
>                 I refactored the code in
>                 org.apache.mahout.classifier.__naivebayes to
>
>             extend AbstractJob, decoupled the model serialization from
>             the job output,
>             extracted trainer classes and tried to clarify naming and
>             reduce code
>             complexity. I also added tests for the training M/R code as
>             well as a toy
>             integration test.
>
>                 It would be great if someone could review my patch to
>                 make sure I didn't
>
>             break anything.
>
>             --
>             This message is automatically generated by JIRA.
>             For more information on JIRA, see:
>             http://www.atlassian.com/__software/jira
>             <http://www.atlassian.com/software/jira>
>
>
>
>
>
>


Mime
View raw message