Return-Path: Delivered-To: apmail-lucene-mahout-dev-archive@locus.apache.org Received: (qmail 27327 invoked from network); 13 Aug 2008 00:12:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Aug 2008 00:12:05 -0000 Received: (qmail 98933 invoked by uid 500); 13 Aug 2008 00:12:04 -0000 Delivered-To: apmail-lucene-mahout-dev-archive@lucene.apache.org Received: (qmail 98910 invoked by uid 500); 13 Aug 2008 00:12:04 -0000 Mailing-List: contact mahout-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-dev@lucene.apache.org Delivered-To: mailing list mahout-dev@lucene.apache.org Received: (qmail 98895 invoked by uid 99); 13 Aug 2008 00:12:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Aug 2008 17:12:04 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Aug 2008 00:11:16 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 6EEC9234C1B5 for ; Tue, 12 Aug 2008 17:11:44 -0700 (PDT) Message-ID: <797641376.1218586304453.JavaMail.jira@brutus> Date: Tue, 12 Aug 2008 17:11:44 -0700 (PDT) From: "Robin Anil (JIRA)" To: mahout-dev@lucene.apache.org Subject: [jira] Issue Comment Edited: (MAHOUT-60) Complementary Naive Bayes In-Reply-To: <544303546.1212273045058.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAHOUT-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622040#action_12622040 ] robinanil edited comment on MAHOUT-60 at 8/12/08 5:11 PM: ----------------------------------------------------------- I have merged the BayesClassifier and CBayesClassifier. Now both use some common Map reduce operation. The specific Map-Reduce operations are factored out. The Model is also factored out. The new feature in this patch is a n-gram generator using the cli parameter -ng If a model is made using a 3-gram then you can use 1/2/3 gram to classify. Try increasing n-gram and see how the classification accuracy grow with it. cbayes.TestTwentyNewsgroups is renamed to bayes.TestClassifier cbayes.TrainTwentyNewsgrousp is renamed to bayes.TrainClassifier The Tests will fail when using this patch. So dont worry. New tests will be put up shortly. {noformat} //To Train a Bayes Classifier using tri-grams hadoop jar build/apache-mahout-0.1-dev-ex.jar org.apache.mahout.examples.classifiers.bayes.TrainClassifier -t -i newstrain -o newsmodel -ng 3 -type bayes //To Test a Bayes Classifier using tri-grams hadoop jar build/apache-mahout-0.1-dev-ex.jar org.apache.mahout.examples.classifiers.bayes.TestClassifier -p newsmodel -t work/newstest -ng 3 -type bayes //To Train a CBayes Classifier using bi-grams hadoop jar build/apache-mahout-0.1-dev-ex.jar org.apache.mahout.examples.classifiers.bayes.TrainClassifier -t -i newstrain -o newsmodel -ng 2 -type cbayes //To Test a CBayes Classifier using bi-grams hadoop jar build/apache-mahout-0.1-dev-ex.jar org.apache.mahout.examples.classifiers.bayes.TestClassifier -p newsmodel -t work/newstest -ng 2 -type cbayes {noformat} Hope you will enjoy using this patch. was (Author: robinanil): I have merged the BayesClassifier and CBayesClassifier. Now both use some common Map reduce operation. The specific Map-Reduce operations are factored out. The Model is also factored out. The new feature in this patch is a n-gram generator using the cli parameter -ng If a model is made using a 3-gram then you can use 1/2/3 gram to classify. Try increasing n-gram and see how the classification accuracy grow with it. cbayes.TestTwentyNewsgroups is renamed to bayes.TestClassifier cbayes.TrainTwentyNewsgrousp is renamed to bayes.TrainClassifier The Tests will fail when using this patch. So dont worry. New tests will be put up shortly. {noformat} //To Train a Bayes Classifier using tri-grams hadoop jar build/apache-mahout-0.1-dev-ex.jar org.apache.mahout.examples.classifiers.bayes.TrainClassifier -t -i newstrain -o newsmodel -ng 3 -type bayes //To Test a Bayes Classifier using tri-grams hadoop jar build/apache-mahout-0.1-dev-ex.jar org.apache.mahout.examples.classifiers.bayes.TestClassifier -p newsmodel -t work/newstest -ng 3 -type bayes //To Train a CBayes Classifier using bi-grams hadoop jar build/apache-mahout-0.1-dev-ex.jar org.apache.mahout.examples.classifiers.bayes.TrainClassifier -t -i newstrain -o newsmodel -ng 2 -type bayes //To Test a CBayes Classifier using bi-grams hadoop jar build/apache-mahout-0.1-dev-ex.jar org.apache.mahout.examples.classifiers.bayes.TestClassifier -p newsmodel -t work/newstest -ng 2 -type cbayes {noformat} Hope you will enjoy using this patch. > Complementary Naive Bayes > ------------------------- > > Key: MAHOUT-60 > URL: https://issues.apache.org/jira/browse/MAHOUT-60 > Project: Mahout > Issue Type: Sub-task > Components: Classification > Reporter: Robin Anil > Assignee: Grant Ingersoll > Priority: Minor > Fix For: 0.1 > > Attachments: country.txt, MAHOUT-60-13082008.patch, MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch, twcnb.jpg > > > The focus is to implement an improved text classifier based on this paper http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.