Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6AF9799FE for ; Sun, 18 Dec 2011 13:52:53 +0000 (UTC) Received: (qmail 97766 invoked by uid 500); 18 Dec 2011 13:52:52 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 97716 invoked by uid 500); 18 Dec 2011 13:52:52 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 97708 invoked by uid 99); 18 Dec 2011 13:52:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Dec 2011 13:52:52 +0000 X-ASF-Spam-Status: No, hits=-2002.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Dec 2011 13:52:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id AB3DA119163 for ; Sun, 18 Dec 2011 13:52:30 +0000 (UTC) Date: Sun, 18 Dec 2011 13:52:30 +0000 (UTC) From: "Berttenfall M. (Created) (JIRA)" To: dev@mahout.apache.org Message-ID: <1362077512.24251.1324216350703.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 RandomForest quits with ArrayIndexOutOfBoundsException while running sample --------------------------------------------------------------------------- Key: MAHOUT-932 URL: https://issues.apache.org/jira/browse/MAHOUT-932 Project: Mahout Issue Type: Bug Components: Classification Affects Versions: 0.6 Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011 Dual Core MacBook Pro 2009, 8 Gb, SSD Reporter: Berttenfall M. Priority: Minor Hello, when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur. First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works. The main error happends during the BuildForest step (I could not test TestForest, due to missing tree). Running: $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest. I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD). It quits after a while (map is almost done) with the following message: 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done. 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done. 11/12/17 16:23:28 INFO mapred.JobClient: map 100% reduce 0% 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8 11/12/17 16:23:28 INFO mapred.JobClient: File Output Format Counters 11/12/17 16:23:28 INFO mapred.JobClient: Bytes Written=41869032 11/12/17 16:23:28 INFO mapred.JobClient: FileSystemCounters 11/12/17 16:23:28 INFO mapred.JobClient: FILE_BYTES_READ=37443033225 11/12/17 16:23:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=44946910704 11/12/17 16:23:28 INFO mapred.JobClient: File Input Format Counters 11/12/17 16:23:28 INFO mapred.JobClient: Bytes Read=20478569 11/12/17 16:23:28 INFO mapred.JobClient: Map-Reduce Framework 11/12/17 16:23:28 INFO mapred.JobClient: Map input records=125973 11/12/17 16:23:28 INFO mapred.JobClient: Spilled Records=0 11/12/17 16:23:28 INFO mapred.JobClient: Map output records=100000 11/12/17 16:23:28 INFO mapred.JobClient: SPLIT_RAW_BYTES=215000 Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100 at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126) at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89) at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303) at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201) at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) PS: I adjusted the class to .classifier.df. and removed -oop -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira