Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 62CE662E2 for ; Tue, 28 Jun 2011 22:55:06 +0000 (UTC) Received: (qmail 27022 invoked by uid 500); 28 Jun 2011 22:55:05 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 26964 invoked by uid 500); 28 Jun 2011 22:55:05 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 26956 invoked by uid 99); 28 Jun 2011 22:55:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 22:55:04 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ssc.open@googlemail.com designates 209.85.161.43 as permitted sender) Received: from [209.85.161.43] (HELO mail-fx0-f43.google.com) (209.85.161.43) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 22:55:00 +0000 Received: by fxg17 with SMTP id 17so602357fxg.30 for ; Tue, 28 Jun 2011 15:54:38 -0700 (PDT) Received: by 10.223.51.4 with SMTP id b4mr139832fag.93.1309301678530; Tue, 28 Jun 2011 15:54:38 -0700 (PDT) Received: from [192.168.0.101] (f052150214.adsl.alicedsl.de [78.52.150.214]) by mx.google.com with ESMTPS id b3sm419128fao.44.2011.06.28.15.54.36 (version=SSLv3 cipher=OTHER); Tue, 28 Jun 2011 15:54:37 -0700 (PDT) Message-ID: <4E0A5BAB.4080903@apache.org> Date: Wed, 29 Jun 2011 00:54:35 +0200 From: Sebastian Schelter Reply-To: ssc@apache.org User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: Ted Dunning CC: dev@mahout.apache.org Subject: Re: [jira] [Commented] (MAHOUT-746) Refactoring of the parallel Naive Bayes implementation in org.apache.mahout.classifier.naivebayes References: <895738008.1329.1309255217961.JavaMail.tomcat@hel.zones.apache.org> <622112424.192.1309297408721.JavaMail.tomcat@hel.zones.apache.org> <4E0A4F48.4050407@apache.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit That paper answered my questions, thank you Ted. I'll rework the patch a little to use variable names more consistent with the paper and I think my colleague was right when he suspected a tiny bug that only occurs when one uses a smoothing parameter different from one. On 29.06.2011 00:03, Ted Dunning wrote > Hmmm... not sure. I thought they were all the same. It is possible > there is a left-over implementation. > > Robin? Care to comment? > > On Tue, Jun 28, 2011 at 3:01 PM, Sebastian Schelter > wrote: > > Is org.apache.mahout.classifier.__naivebayes also based on that one? > I thought it was only relevant for org.apache.mahout.classifier.__bayes? > > > On 28.06.2011 23:58, Ted Dunning wrote: > > See here: > http://citeseerx.ist.psu.edu/__viewdoc/summary?doi=10.1.1.13.__8572&rank=1 > > > On Tue, Jun 28, 2011 at 2:43 PM, Sebastian Schelter (JIRA) > >wrote: > > > [ > https://issues.apache.org/__jira/browse/MAHOUT-746?page=__com.atlassian.jira.plugin.__system.issuetabpanels:comment-__tabpanel&focusedCommentId=__13056805#comment-13056805 > ] > > Sebastian Schelter commented on MAHOUT-746: > ------------------------------__------------- > > Thank you very much, Sean. > > I wonder whether there is some article/paper that describes > this particular > approach of implementing Naive Bayes? A colleague of mine > with a much deeper > statistics background and me took a look at the details of > the computation > today and we were left with some open questions. > > Refactoring of the parallel Naive Bayes implementation in > > org.apache.mahout.classifier.__naivebayes > > > ------------------------------__------------------------------__------------------------------__------- > > > Key: MAHOUT-746 > URL: > https://issues.apache.org/__jira/browse/MAHOUT-746 > > Project: Mahout > Issue Type: Improvement > Components: Classification > Affects Versions: 0.6 > Reporter: Sebastian Schelter > Assignee: Sebastian Schelter > Fix For: 0.6 > > Attachments: MAHOUT-746.patch > > > I refactored the code in > org.apache.mahout.classifier.__naivebayes to > > extend AbstractJob, decoupled the model serialization from > the job output, > extracted trainer classes and tried to clarify naming and > reduce code > complexity. I also added tests for the training M/R code as > well as a toy > integration test. > > It would be great if someone could review my patch to > make sure I didn't > > break anything. > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: > http://www.atlassian.com/__software/jira > > > > > > >