mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <static.void....@gmail.com>
Subject Re: Problems running examples
Date Sun, 05 Jun 2011 18:12:35 GMT
It seems like seqdirectory expects the input to be on HDFS and not 
local? Running the below command will write an empty output directory on 
HDFS

MAHOUT_LOCAL=true $MAHOUT seqdirectory \
         -i mahout-work/reuters-out \
         -o mahout-work/reuters-out-seqdir \
         -c UTF-8 -chunk 5

If I put the input directory into HDFS then all will work as expected. 
Does seqdirectory expect its input to be on HDFS.. ie is this the 
expected behavior? If so, should the example be updated?

On 6/5/11 11:07 AM, Mark wrote:
> Hi all. I'm trying to run the examples/bin/build-reuters.sh but I 
> continue to run into the following exception.
>
> INFO: Deleting mahout-work/reuters-kmeans-clusters
> Jun 5, 2011 10:29:37 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
> WARNING: Unable to load native-hadoop library for your platform... 
> using builtin-java classes where applicable
> Jun 5, 2011 10:29:37 AM org.apache.hadoop.io.compress.CodecPool 
> getCompressor
> INFO: Got brand-new compressor
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 
> 0, Size: 0
>     at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>     at java.util.ArrayList.get(ArrayList.java:322)
>     at 
> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:108)
>     at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:101)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at 
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:58)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at 
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
>
> I am also confused reading the build-reuters.sh code itself. There 
> seems to be some disjunction between what is expected to be local and 
> what should be on HDFS. For example on the comments on 77-79 are:
>
> # we know reuters-out-seqdir exists on a local disk at
> # this point, if we're running in clustered mode,
> # copy it up to hdfs
>
> However upon inspection you'll notice that the reueters-out-seqdir is 
> actually on HDFS.  It seems like the seqdirectory will never write to 
> local disk... even with the MAHOUT_LOCAL=true flag set.
>
> Any ideas?
>
> Thanks

Mime
View raw message