mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Problems running examples
Date Sat, 11 Jun 2011 11:32:03 GMT
What do you get when you run on good ol' Hadoop, i.e the one we actually support and build
and test on?  

On Jun 10, 2011, at 7:38 PM, Jeff Eastman wrote:

> Moving to @dev
> 
> Hi Drew,
> 
> Don't know what is happening, but I did a clean unpack of the 0.5 distro, mvn install
and ran build-reuters.sh. It downloaded the data but failed exactly as before. Both continue
to run just fine on my trunk build since I updated yesterday. IIRC, they were both failing
with trunk before 0.5 too.
> 
> On MapR:
> [dev@devbox mahout-distribution-0.5]$ ./examples/bin/build-reuters.sh
> Please select a number to choose the corresponding clustering algorithm
> 1. kmeans clustering
> 2. lda clustering
> Enter your choice : 1
> ok. You chose 1 and we'll use kmeans Clustering
> Downloading Reuters-21578
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                 Dload  Upload   Total   Spent    Left  Speed
> 100 7959k  100 7959k    0     0  1769k      0  0:00:04  0:00:04 --:--:-- 1788k
> Extracting...
> Running on hadoop, using HADOOP_HOME=/opt/mapr/hadoop/hadoop-0.20.2
> HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-0.20.2/conf.new
> 11/06/10 16:12:19 WARN driver.MahoutDriver: No org.apache.lucene.benchmark.utils.ExtractReuters.props
found on classpath, will use command-line arguments only
> Deleting all files in mahout-work/reuters-out-tmp
> 11/06/10 16:12:24 INFO driver.MahoutDriver: Program took 4085 ms
> MAHOUT_LOCAL is set, running locally
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/mahout-examples-0.5-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/dependency/slf4j-jcl-1.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> Jun 10, 2011 4:12:25 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647,
--fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=mahout-work/reuters-out,
--keyPrefix=, --output=mahout-work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
> Exception in thread "main" java.io.IOException: No FileSystem for scheme: maprfs
>        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
>        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
>        at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:62)
>        at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> rmr: cannot remove mahout-work/reuters-out-seqdir: No such file or directory.
> put: File mahout-work/reuters-out-seqdir does not exist.
> 
> And then, after changing HADOOP_HOME & HADOOP_CONF_DIR to CDH3 on a fresh untar/install
of 0.5:
> [dev@devbox mahout-distribution-0.5]$ ./examples/bin/build-reuters.sh
> Please select a number to choose the corresponding clustering algorithm
> 1. kmeans clustering
> 2. lda clustering
> Enter your choice : 1
> ok. You chose 1 and we'll use kmeans Clustering
> Downloading Reuters-21578
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                 Dload  Upload   Total   Spent    Left  Speed
> 100 7959k  100 7959k    0     0  1707k      0  0:00:04  0:00:04 --:--:-- 1768k
> Extracting...
> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop
> HADOOP_CONF_DIR=/usr/lib/hadoop/hadoop1.conf
> 11/06/10 16:29:42 WARN driver.MahoutDriver: No org.apache.lucene.benchmark.utils.ExtractReuters.props
found on classpath, will use command-line arguments only
> Deleting all files in mahout-work/reuters-out-tmp
> 11/06/10 16:29:45 INFO driver.MahoutDriver: Program took 3669 ms
> MAHOUT_LOCAL is set, running locally
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/mahout-examples-0.5-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/dependency/slf4j-jcl-1.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> Jun 10, 2011 4:30:02 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647,
--fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=mahout-work/reuters-out,
--keyPrefix=, --output=mahout-work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
> Exception in thread "main" java.io.IOException: Call to hadoop1.eng.narus.com/172.31.2.200:8020
failed on local exception: java.io.EOFException
>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at $Proxy0.getProtocolVersion(Unknown Source)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
>        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
>        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
>        at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:62)
>        at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
> rmr: cannot remove mahout-work/reuters-out-seqdir: No such file or directory.
> put: File mahout-work/reuters-out-seqdir does not exist.
> 
> I do notice that, after each of these runs on a pristine untar/install, I get a slightly
different initial output but the same exception:
> [dev@devbox mahout-distribution-0.5]$ ./examples/bin/build-reuters.sh
> Please select a number to choose the corresponding clustering algorithm
> 1. kmeans clustering
> 2. lda clustering
> Enter your choice : 1
> ok. You chose 1 and we'll use kmeans Clustering
> MAHOUT_LOCAL is set, running locally
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/mahout-examples-0.5-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/dependency/slf4j-jcl-1.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> Jun 10, 2011 4:33:07 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647,
--fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=mahout-work/reuters-out,
--keyPrefix=, --output=mahout-work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
> Exception in thread "main" java.io.IOException: Call to hadoop1.eng.narus.com/172.31.2.200:8020
failed on local exception: java.io.EOFException
>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at $Proxy0.getProtocolVersion(Unknown Source)
> 
> There is no $MAHOUT_LOCAL in my environment but I notice the script does set this internally.
Something must be different in trunk but I cannot find it.
> 
> -----Original Message-----
> From: Drew Farris [mailto:drew@apache.org]
> Sent: Friday, June 10, 2011 2:57 PM
> To: user@mahout.apache.org
> Subject: Re: Problems running examples
> 
> Hmm, I've been able to download the 0.5 src release and run it in
> clustered mode. In most cases it completes fine. I ran into problems
> once when I had left a mahout-work directory lying around from a
> partially completed (aborted) run. I wonder if that could have
> something to do with the failures you are seeing too Jeff?
> 
> The binary release of 0.5 is most definitely broken, but that breakage
> was discussed in another thread and is due to classpath issues in
> bin/mahout vs. where things are placed in the binary release.
> 
> On Fri, Jun 10, 2011 at 12:34 PM, Jeff Eastman <jeastman@narus.com> wrote:
>> I'm still trying to figure out why reuters-0.5 does not work on either of my clusters.
The scripts themselves have no diff and the environment variables are set as in trunk except
for MAHOUT_HOME. The synthetic control and 20 newsgroups examples run on both clusters without
problems (well, 20 newsgroups has a Version Mismatch error on CDH3, but that is another story).
But when I run reuters on 0.5 I see "MAHOUT_LOCAL is set, running locally" followed by file
IO exceptions in MahoutDriver that are cluster dependent. When I run it on trunk, I don't
see this and it works just fine.
>> 
>> -----Original Message-----
>> From: Drew Farris [mailto:drew@apache.org]
>> Sent: Thursday, June 09, 2011 5:36 PM
>> To: user@mahout.apache.org
>> Subject: Re: Problems running examples
>> 
>> Jeff, No impuning perceived and thanks for running the variety of
>> tests. So it appears that trunk is fine and 0.5 isn't. I'll try to
>> determine what (or what didn't) make it into 0.5 that causes it's
>> brokenness.
>> 
>> Mark, in the mean time, no need to run all of the tests I've asked
>> about previously. Just give trunk a try and see if that resolves your
>> problem.
>> 
>> On Thu, Jun 9, 2011 at 7:21 PM, Jeff Eastman <jeastman@narus.com> wrote:
>>> Hi Drew,
>>> 
>>> Running trunk locally, latest update, just now, build-reuters.sh works (kmeans
and lda).
>>> 
>>> Running trunk on my CDH3 cluster, just now:
>>> - build-cluster-syntheticcontrol.sh works (with kmeans and others)
>>> - build-reuters.sh works (with kmeans and lda) Running trunk on my CDH3 cluster:
>>> 
>>> Running trunk on my MapR cluster, just now:
>>> - build-cluster-syntheticcontrol.sh works (with kmeans and others)
>>> - build-reuters.sh works (with kmeans and lda)
>>> 
>>> 
>>> Running the 5/31 mahout-distribution-0.5, just now:
>>> - build-cluster-syntheticcontrol.sh works (CDH3 & MapR with kmeans and others)
>>> - build-reuters.sh runs in local mode only (CDH3 & MapR runs give different
errors)
>>> 
>>> I was primarily defending kmeans. It is possible my 5/31 0.5 distribution is
not the final one, since everything seems kosher in trunk now. My apology if I've impuned
your patch.
>>> 
>>> Jeff
>>> 
>>> 
>>> -----Original Message-----
>>> From: Drew Farris [mailto:drew@apache.org]
>>> Sent: Thursday, June 09, 2011 11:36 AM
>>> To: user@mahout.apache.org
>>> Subject: Re: Problems running examples
>>> 
>>> Jeff,
>>> 
>>> Could you tell me about what's failing in KMeans and LDA when running
>>> on a cluster? I had this working just prior to 0.5 in
>>> https://issues.apache.org/jira/browse/MAHOUT-694
>>> 
>>> Thanks,
>>> 
>>> Drew
>>> 
>>> On Thu, Jun 9, 2011 at 2:01 PM, Jeff Eastman <jeastman@narus.com> wrote:
>>>> Ahem, KMeans is not busted. It is being maintained by me, at least. The build-reuters.sh
script runs only in local mode on 0.5 and fails in both KMeans and LDA when run on a cluster.
The MIA examples are not always correct. Most of this has been reported before.
>>>> 
>>>> -----Original Message-----
>>>> From: Sean Owen [mailto:srowen@gmail.com]
>>>> Sent: Thursday, June 09, 2011 12:29 AM
>>>> To: user@mahout.apache.org
>>>> Subject: Re: Problems running examples
>>>> 
>>>> (Assuming you are on HEAD,) I think KMeans is busted -- this has come up
>>>> before. I don't know if it is being maintained.  Anyone who's willing to
>>>> step up and fix it is also welcome to overhaul it IMHO.
>>>> 
>>>> On Thu, Jun 9, 2011 at 12:03 AM, Hector Yee <hector.yee@gmail.com>
wrote:
>>>> 
>>>>> I got a slightly different error on the next line of KMeansDriver.java
>>>>> (running on OS X Snow Leopard)
>>>>> 
>>>>> 11/06/08 16:02:12 INFO compress.CodecPool: Got brand-new compressor
>>>>> Exception in thread "main" java.lang.ClassCastException:
>>>>> org.apache.hadoop.io.IntWritable cannot be cast to
>>>>> org.apache.mahout.math.VectorWritable
>>>>> at
>>>>> 
>>>>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:90)
>>>>> at
>>>>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:102)
>>>>> 
>>>>> 
>>>>> On Sun, Jun 5, 2011 at 9:31 PM, Jeff Eastman <jeastman@narus.com>
wrote:
>>>>> 
>>>>>> IIRC, Reuters used to run on a cluster but no longer does due to
some
>>>>>> obscure Lucene changes. In 0.5 it only works in local mode. I really
hope
>>>>>> this can be repaired by 0.6 as Reuters is a key entry point into
Mahout
>>>>>> clustering for many users.
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Mime
View raw message