mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From deneche abdelhakim <a_dene...@yahoo.fr>
Subject Re : Question about mahout Describe
Date Sat, 03 Apr 2010 19:21:59 GMT
Hi,

Just committed a new version of TestForest. If you add "-mr" to the command line it should
launch a Hadoop Job to classify the data. This is a basic implementation that can't compute
the confusion matrix, so using "-a" has no effect. This implementation is also not tested
very well (being a work in progress), so if you want to test it, select a random subset of
your test data and classify them using the sequential implementation (without using -mr) then
compare the predictions with those of the distributed implementation, the results won't be
exactly the same (due the random behavior of the classifier when it encounter ties) but 90%
of the predictions should be the same.

let me know what you think of it. I'm working on the confusion matrix, but it should take
some time to finish

--- En date de : Ven 26.3.10, Yang Sun <soushare.com@gmail.com> a écrit :

> De: Yang Sun <soushare.com@gmail.com>
> Objet: Question about mahout Describe
> À: mahout-user@lucene.apache.org
> Date: Vendredi 26 mars 2010, 22h16
> I was testing mahout recently. It
> runs great on small testing datasets.
> However, when I try to expand the dataset to a big dataset
> directory, I got
> the following error message:
> 
> [localhost]$ hjar
> examples/target/mahout-examples-0.4-SNAPSHOT.job
> org.apache.mahout.df.mapreduce.TestForest -i
> /user/fulltestdata/* -ds rf/
> testdata.info -m rf-testmodel-5-100 -a -o
> rf/fulltestprediction
> 
> Exception in thread "main" java.io.IOException: Cannot open
> filename
> /user/fulltestdata/*
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1465)
>         at
> org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:372)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
>         at
> org.apache.hadoop.fs.FileSystem.open(FileSystem.java:351)
>         at
> org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:190)
>         at
> org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:137)
>         at
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at
> org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:228)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at
> java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> My question is: can I use mahout on directories instead of
> single files? and
> how?
> 
> Thanks,
> 


      

Mime
View raw message