Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 35045 invoked from network); 18 Oct 2010 18:42:09 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Oct 2010 18:42:09 -0000 Received: (qmail 69103 invoked by uid 500); 18 Oct 2010 18:42:08 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 69063 invoked by uid 500); 18 Oct 2010 18:42:08 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 69055 invoked by uid 99); 18 Oct 2010 18:42:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Oct 2010 18:42:08 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.215.42 as permitted sender) Received: from [209.85.215.42] (HELO mail-ew0-f42.google.com) (209.85.215.42) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Oct 2010 18:42:00 +0000 Received: by ewy24 with SMTP id 24so906032ewy.1 for ; Mon, 18 Oct 2010 11:41:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:cc:content-type; bh=J9vt6eZZZW088lRU6rfhR6i38VKwsZb9WYJ+KY10l+s=; b=Ddjw2+6yOLTT2dqQGocOgUWR9wgYIw+2HLcp0vJ7KLuK4bykgBANbcP6VgIfEMorQN zn7z8MJrePe3mHATViKyZVlYVzr2pbiySvtAWpX3R8j0PfdalhFphN9wJ6lRdAtWfBsM axhAV9xw4c3kx90YCrLWFQU1Cs5zlp9aI0a2o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; b=dkrvG8xrWMaFpS2f0xAal0s/wjO9LjzWLIxCfWmOdTTPObvLD3oTj8Z4xmN7fVdt5I Er7lDc2hpnatKKiUlAI3tQ4hWz7YBJ2gsUMXRGAj2FGH0CbhgHK/roYyi/U2rU6tvN4D cGJvkk1oMxzbdZDaqIYxIV1K9icp3vWbTHuUw= Received: by 10.213.28.205 with SMTP id n13mr4703939ebc.5.1287427300297; Mon, 18 Oct 2010 11:41:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.179.195 with HTTP; Mon, 18 Oct 2010 11:41:19 -0700 (PDT) In-Reply-To: References: From: Ted Dunning Date: Mon, 18 Oct 2010 11:41:19 -0700 Message-ID: Subject: Re: Querry regarding use of classifier in Mahout To: user@mahout.apache.org Cc: robin.anil@gmail.com Content-Type: multipart/alternative; boundary=0015174c3960e891c50492e88639 X-Virus-Checked: Checked by ClamAV on apache.org --0015174c3960e891c50492e88639 Content-Type: text/plain; charset=UTF-8 Remember it is on the training data! Naive Bayes classifiers have the property that they overfit massively but still give good results on held out data. Thus, when tested on the same data that they trained with, they demonstrate results that are unrealistically good. This is still an important thing to look at. It just isn't really 200 times lower error rate than any other result ever reported on this dataset. On Mon, Oct 18, 2010 at 11:26 AM, JAGANADH G wrote: > >> > Correctly Classified Instances : 1995 99.75% > >> > Incorrectly Classified Instances : 5 0.25% > >> > Total Classified Instances : 2000 > >> > > >> > ======================================================= > >> > Confusion Matrix > >> > ------------------------------------------------------- > >> > a b <--Classified as > >> > 995 5 | 1000 a = pos > >> > 0 1000 | 1000 b = neg > >> > Default Category: unknown: 2 > >> > > >> > > >> > With some pruning, you will have a decent enough classifier for > >> sentiments > > > Wow this is an amazing result :-) --0015174c3960e891c50492e88639--