Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 22180 invoked from network); 9 Nov 2010 20:10:15 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Nov 2010 20:10:15 -0000 Received: (qmail 23837 invoked by uid 500); 9 Nov 2010 20:10:45 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 23810 invoked by uid 500); 9 Nov 2010 20:10:45 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 23802 invoked by uid 99); 9 Nov 2010 20:10:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Nov 2010 20:10:45 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 74.125.82.170 as permitted sender) Received: from [74.125.82.170] (HELO mail-wy0-f170.google.com) (74.125.82.170) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Nov 2010 20:10:37 +0000 Received: by wyb35 with SMTP id 35so7024323wyb.1 for ; Tue, 09 Nov 2010 12:10:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=P5MLoDS43dRUGhXfm1tRW16uO4il3J4NrWzfr9POnj0=; b=DmBqREvg3SgtuV1fEzG2sOgmBhxWLP+ei8xA+Z8+kKl4awVafdF0FUkdCeSLBAuTdd F64m82J/s+R8+h2aais0wwjSeuYlkmzfHvR+y578Y1KRgF+Gyg7F9gh/p9Ti8N/Hwago N51Y8DY+zj0NN80U9aZAvlJ68EidVAq5YjGQI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=k4fUbQpB3TDbpNIPtsTIDz5JvXJ9SoXTDr9pLr3e5Dtikutc1gn9BxwnJ/LTEKSaUr Bq8TexrSwvimtLBpgo3pzlJlUy+b0j+30FvY10cQABPhJEPr2rg+2qkztDwGQObNS5MJ WqIxXYKOODI1bJvYYbqW0uv3tQeKHEeHiboVA= Received: by 10.216.35.139 with SMTP id u11mr205545wea.15.1289333417366; Tue, 09 Nov 2010 12:10:17 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.175.5 with HTTP; Tue, 9 Nov 2010 12:09:55 -0800 (PST) In-Reply-To: References: From: Ted Dunning Date: Tue, 9 Nov 2010 12:09:55 -0800 Message-ID: Subject: Re: Using NB classifier To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=0016367b607056d6560494a454d8 X-Virus-Checked: Checked by ClamAV on apache.org --0016367b607056d6560494a454d8 Content-Type: text/plain; charset=UTF-8 You are close. The testing for correctness involves computing the classification according to the Naive Bayes model. You should also look at TrainNewsGroups in the examples. It shows how to build and run an SGD model which is the major alternative in Mahout to the NaiveBayes models. On Tue, Nov 9, 2010 at 11:41 AM, ivek gimmick wrote: > Hi, I am trying to solve a simple classification problem. > > > The Problem: > I have a set of text and I have to categorize them based on the content. > > Solution using Mahout: > > I understood that I have to convert the input to a sequence file to > generate the model. Yes, I was able to do this. Now, how do I categorize > my test data? The 20News example only tests for correctness. But, I want > to do the actual classification. > > I am not sure if I need to write code or use some existing classes > available to classify the test set. > > > P.S. Sorry if you are seeing this message for the 2nd time. > > Regards, > ~Gim > --0016367b607056d6560494a454d8--