mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deneche A. Hakim (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample
Date Sun, 18 Dec 2011 17:08:30 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171901#comment-13171901
] 

Deneche A. Hakim commented on MAHOUT-932:
-----------------------------------------

ARFF isn't supported, you need to remove the header from the file first. It's not really a
big deal to make it work with the sequential classifier but it's much more complicated when
it comes to the distributed one.

It's strange that BuildForest throws an exception, I tried that same example a week ago and
it run well. What file are you using to train the classifier ? KDDTrain+.ARFF ?

Two more questions: what version of Hadoop are you using ? and how is it installed: standalone,
pseudo-distributed or fully distributed ?
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest checkout from
17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html
with the recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended
here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits
when creating the description file (wrong number of attributes in the string), using UCI format
it works.
> The main error happends during the BuildForest step (I could not test TestForest, due
to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231
-d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the following error.
1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is
in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to
commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0'
to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
> 	at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
> 	at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message