Return-Path: Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: (qmail 73445 invoked from network); 10 Oct 2010 09:56:45 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Oct 2010 09:56:45 -0000 Received: (qmail 89364 invoked by uid 500); 10 Oct 2010 09:56:45 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 88927 invoked by uid 500); 10 Oct 2010 09:56:42 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 88917 invoked by uid 99); 10 Oct 2010 09:56:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Oct 2010 09:56:40 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of srowen@gmail.com designates 209.85.214.170 as permitted sender) Received: from [209.85.214.170] (HELO mail-iw0-f170.google.com) (209.85.214.170) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Oct 2010 09:56:32 +0000 Received: by iwn37 with SMTP id 37so1408319iwn.1 for ; Sun, 10 Oct 2010 02:56:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=70SmBPocbjIG1SEUNJV6Ve/yObB+xrkI5/ETmsP6ISI=; b=RW2FW0Kj2MYaKfzHnRuQfDQnAvFDappaMTKDvshiGmBVTTp4P4hxaa4qD5JD3bkSRY RTOQ2CpFd82LxXSpjd4ycYONWFi+vlWN8xdf04a+S4K4W4xV3oEHnbhHsDCxAE3AWTCv 2FpkGWlySy8tlS2ErXA4cd3Ch1qQGiIcgsjFE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=s/+C8NSMCFWPtMS2MQqsMjAzeXVmUV1DrZYbcbV4CqVPZTAlV1gqLlWtmydGgKkk3N CQmT/suupBmcwO1aN85d3Kak4OQST8Z3wHdMybpDT1pQ7C8I3nB2NJdFJwAbnXl44Vtk 3kA9jwx80DNMxlgsDOPzbcjPlAxgXP4oXXWvs= MIME-Version: 1.0 Received: by 10.231.10.141 with SMTP id p13mr3809696ibp.183.1286704571491; Sun, 10 Oct 2010 02:56:11 -0700 (PDT) Received: by 10.231.150.70 with HTTP; Sun, 10 Oct 2010 02:56:11 -0700 (PDT) In-Reply-To: References: Date: Sun, 10 Oct 2010 10:56:11 +0100 Message-ID: Subject: Re: TrainNewsGroups for SGD From: Sean Owen To: dev@mahout.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I can commit Joe's fix for the ".DS_Store" problem -- seems like a clear bug so valid to change even in the quiet period. I will also commit a change that un-chains that second stack trace by one. There is no need to have ExecutionException in there and it obscures the cause. I don't know more about that. On Sun, Oct 10, 2010 at 5:25 AM, Joe Kumar wrote: > Ted, > > I just started testing TrainNewsGroups and am executing it through eclips= e, > passing the location of directory 20news-18828 to the program. > > I encountered an Exception when the code was trying to read the files ins= ide > the newsgroup directory > using files.addAll(Arrays.asList(newsgroup.listFiles())); > The directory of newgroup had a DS_Store file which made the above code > throw an Exception. So I modified the code as > > if(newsgroup.isDirectory()){ > > =C2=A0 =C2=A0 =C2=A0 =C2=A0files.addAll(Arrays.asList(newsgroup.listFiles= ())); > > =C2=A0 =C2=A0 =C2=A0} > > to fix it > > After fixing this, I get the below log and exception > > > 18828 training files > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 1 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 2 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 3 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 4 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 6 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 8 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 10 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 12 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 15 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 20 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 25 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 30 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 40 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 50 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 60 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 70 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 80 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 100 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 120 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 140 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 150 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 200 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 250 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 300 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 400 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 500 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 600 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 700 0.000 0.00 none > > 0.00 0.00 0.00 0.00 0.00000000 0.00000000 800 0.000 0.00 none > > Exception in thread "main" java.lang.IllegalStateException: > java.util.concurrent.ExecutionException: > java.lang.ArrayIndexOutOfBoundsException: 19 > > at > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBuff= eredExamples( > AdaptiveLogisticRegression.java:137) > > at org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train( > AdaptiveLogisticRegression.java:111) > > at org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train( > AdaptiveLogisticRegression.java:97) > > at org.apache.mahout.classifier.sgd.TrainNewsGroups.main( > TrainNewsGroups.java:164) > > Caused by: java.util.concurrent.ExecutionException: > java.lang.ArrayIndexOutOfBoundsException: 19 > > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > > at org.apache.mahout.ep.EvolutionaryProcess.parallelDo( > EvolutionaryProcess.java:154) > > at > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBuff= eredExamples( > AdaptiveLogisticRegression.java:117) > > ... 3 more > > I am not sure if I am doing something wrong. Thought I'll check with you = and > document the process of running this example and other details about SGD. > > reg, > > Joe. >