Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 92514 invoked from network); 19 Nov 2010 07:45:56 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 Nov 2010 07:45:56 -0000 Received: (qmail 36527 invoked by uid 500); 19 Nov 2010 07:46:28 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 36215 invoked by uid 500); 19 Nov 2010 07:46:27 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 36207 invoked by uid 99); 19 Nov 2010 07:46:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Nov 2010 07:46:27 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of divya@k2associates.com.sg designates 202.75.59.30 as permitted sender) Received: from [202.75.59.30] (HELO host-9a.onnet.com.my) (202.75.59.30) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Nov 2010 07:46:20 +0000 Received: from 224.210-193-58.adsl.qala.com.sg ([210.193.58.224] helo=k2asystem) by host-9a.onnet.com.my with esmtp (Exim 4.69) (envelope-from ) id 1PJLfJ-0001FC-2f; Fri, 19 Nov 2010 15:45:50 +0800 From: "Divya" To: Cc: References: <004901cb87b6$45f96140$d1ec23c0$@com.sg> In-Reply-To: Subject: RE: classification example doubts Date: Fri, 19 Nov 2010 15:45:48 +0800 Message-ID: <005401cb87bd$c96836d0$5c38a470$@com.sg> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcuHuLrsOZs6CFoVRjGeuoZlRQ//dQAAv3LQ Content-Language: en-us x-cr-hashedpuzzle: DKuZ D67K ElhZ E/OE FOkL GYR/ IHa6 Jp8I KBok MoPa MzG6 NoRv SVZz UUth VW97 Vlh7;2;agBhAGcAYQBuAGEAZABoAGcAQABnAG0AYQBpAGwALgBjAG8AbQA7AHUAcwBlAHIAQABtAGEAaABvAHUAdAAuAGEAcABhAGMAaABlAC4AbwByAGcA;Sosha1_v1;7;{11E31E40-8C79-485B-8DE5-513A45E27217};ZABpAHYAeQBhAEAAawAyAGEAcwBzAG8AYwBpAGEAdABlAHMALgBjAG8AbQAuAHMAZwA=;Fri, 19 Nov 2010 07:45:41 GMT;UgBFADoAIABjAGwAYQBzAHMAaQBmAGkAYwBhAHQAaQBvAG4AIABlAHgAYQBtAHAAbABlACAAZABvAHUAYgB0AHMA x-cr-puzzleid: {11E31E40-8C79-485B-8DE5-513A45E27217} X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - host-9a.onnet.com.my X-AntiAbuse: Original Domain - mahout.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - k2associates.com.sg for my first question u say we can put our own input documents in directory that documents also should be of format similar to bayes-train-input. If yes, then I generated my input data using PrepareTwentyNewsgroups. And used that as my input for testclassifier But didn't get expected results. As I observed it didn't read my files I my input directory I tried replacing one of the files of input directory with one of the files of train-input directory Still same result. Why is it not reading my files? Results below : 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: comp.sys.mac.hardware -121323.6282757108 547567.2698760114 -0.2215684445551005 2 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.space -189203.04544769705 547567.2698760114 -0.3455338838834164 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: rec.motorcycles -138625.2628242977 547567.2698760114 -0.25316572127418674 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: rec.autos -136935.18434679657 547567.2698760114 -0.25007919917821886 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: comp.graphics -161979.38306986375 547567.2698760114 -0.29581640828631267 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: talk.politics.misc -159579.70032298338 547567.2698760114 -0.29143396455949216 10/11/19 10:45:12 INFO datastore.InMemoryBayesDatastore: sci.med -183835.5334355675 547567.2698760114 -0.3357314133790253 10/11/19 10:45:12 INFO bayes.TestClassifier: ======================================================= Summary ------------------------------------------------------- Correctly Classified Instances : 0 ?% Incorrectly Classified Instances : 0 ?% Total Classified Instances : 0 ======================================================= Confusion Matrix ------------------------------------------------------- a b c d e f g h i j k l m n o p q r s t <--Classified as 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 a = rec.sport.baseball 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 b = sci.crypt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 c = rec.sport.hockey 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 d = talk.politics.guns 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 e = soc.religion.christian 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 f = sci.electronics 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 g = comp.os.ms-windows.misc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 h = misc.forsale 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 i = talk.religion.misc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 j = alt.atheism 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 k = comp.windows.x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 l = talk.politics.mideast 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 m = comp.sys.ibm.pc.hardware 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 n = comp.sys.mac.hardware 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 o = sci.space 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 p = rec.motorcycles 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 q = rec.autos 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 r = comp.graphics 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 s = talk.politics.misc 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 0 t = sci.med Default Category: unknown: 20 10/11/19 10:45:12 INFO driver.MahoutDriver: Program took 5485 ms Am I missing anything . Come to my second question, that means we are testing the classifier against our inputs itself. Still I didn't understand. What I understood about classification is we have set of documents which will act as model for classification of new documents in the system. Am I right? Doesn't Mahout works in same way ? Third question, yeah I am looking for Mahout's API for classification. @ Jaganadh - Thanks for clearing my doubts Regards, Divya -----Original Message----- From: JAGANADH G [mailto:jaganadhg@gmail.com] Sent: Friday, November 19, 2010 3:09 PM To: user@mahout.apache.org Subject: Re: classification example doubts > > 1) I want to know what should go in "bayes-test-input". > > After preparing the 20news-group data for training you can separate some documents for testing your classifier. These documents should go to "bayes-test-input". Or ven you can put a new set of documets in the directory . > 2) If we take Wikipedia example > https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html > > > > To trainclassifier We have used Wikipediainput to generate model . > > To test classifier again we used wikipediamodel as input and Wikipedia > input > as test documents directory. > > I didn't understand why are we doing so ? > > We are testing the classifier against the development set we used. > 3) Last thing I want to know that when we use run testclassifier using > command line we can see the output. > > How can we make use of this output? > Are you looking for Mahout API usgae for classification ? -- ********************************** JAGANADH G http://jaganadhg.freeflux.net/blog