hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Starr <dsta...@gmail.com>
Subject Re: Hadoop Streaming File-not-found error on Cloudera's training VM
Date Thu, 18 Feb 2010 05:58:16 GMT
Todd, Thanks!
This solved it.

-Dan

On Wed, Feb 17, 2010 at 8:00 PM, Todd Lipcon <todd@cloudera.com> wrote:
> Hi Dan,
>
> This is actually a bug in the release you're using. Please run:
>
> $ sudo apt-get update
> $ sudo apt-get install hadoop-0.20
>
> Then restart the daemons (or the entire VM) and give it another go.
>
> Thanks
> -Todd
>
> On Wed, Feb 17, 2010 at 7:56 PM, Dan Starr <dstarr1@gmail.com> wrote:
>> Yes, I have tried that when passing the script.  Just now I tried:
>>
>> hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-0.20.1+133-streaming.jar
>> -mapper blah.py -reducer org.apache.hadoop.mapred.lib.IdentityReducer
>> -input test_input/* -output output -file blah.py
>>
>> And got this error for a map task:
>>
>> java.io.IOException: Cannot run program "blah.py":
>> java.io.IOException: error=2, No such file or directory
>>        at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
>>        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
>>        at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
>>        ...
>>
>> -Dan
>>
>>
>> On Wed, Feb 17, 2010 at 7:47 PM, Todd Lipcon <todd@cloudera.com> wrote:
>>> Are you passing the python script to the cluster using the -file
>>> option? eg -mapper foo.py -file foo.py
>>>
>>> Thanks
>>> -Todd
>>>
>>> On Wed, Feb 17, 2010 at 7:45 PM, Dan Starr <dstarr1@gmail.com> wrote:
>>>> Hi, I've tried posting this to Cloudera's community support site, but
>>>> the community website getsatisfaction.com returns various server
>>>> errors at the moment.  I believe the following is an issue related to
>>>> my environment within Cloudera's Training virtual machine.
>>>>
>>>> Despite having success running Hadoop streaming on other Hadoop
>>>> clusters and on Cloudera's Training VM in local mode, I'm currently
>>>> getting an error when attempting to run a simple Hadoop streaming job
>>>> in the normal queue based mode on the Training VM.  I'm thinking the
>>>> error described below is an issue related to the worker node not
>>>> recognizing the python reference in the script's top shebang line.
>>>>
>>>> The hadoop command I am executing is:
>>>>
>>>> hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-0.20.1+133-streaming.jar
>>>> -mapper blah.py -reducer org.apache.hadoop.mapred.lib.IdentityReducer
>>>> -input test_input/* -output output
>>>>
>>>> Where the test_input directory contains 3 UNIX formatted, single line files:
>>>>
>>>> training-vm: 3$ hadoop dfs -ls /user/training/test_input/
>>>> Found 3 items
>>>> -rw-r--r--   1 training supergroup         11 2010-02-17 10:48
>>>> /user/training/test_input/file1
>>>> -rw-r--r--   1 training supergroup         11 2010-02-17 10:48
>>>> /user/training/test_input/file2
>>>> -rw-r--r--   1 training supergroup         11 2010-02-17 10:48
>>>> /user/training/test_input/file3
>>>>
>>>> training-vm: 3$ hadoop dfs -cat /user/training/test_input/*
>>>> test_line1
>>>> test_line2
>>>> test_line3
>>>>
>>>> And where blah.py looks like (UNIX formatted):
>>>>
>>>> #!/usr/bin/python
>>>> import sys
>>>> for line in sys.stdin:
>>>>    print line
>>>>
>>>> The resulting Hadoop-Streaming error is:
>>>>
>>>> java.io.IOException: Cannot run program "blah.py":
>>>> java.io.IOException: error=2, No such file or directory
>>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
>>>> at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
>>>>    ...
>>>>
>>>>
>>>> I get the same error when placing the python script on the HDFS, and
>>>> then using this in the hadoop command:
>>>>
>>>> ... -mapper hdfs:///user/training/blah.py ...
>>>>
>>>>
>>>> One suggestion found online, which may not be relevant to Cloudera's
>>>> distribution, mentions that the first line of the hadoop-streaming
>>>> python script (the shebang line) may not describe an applicable path
>>>> for the system.  The solution mentioned is to use: ... -mapper "python
>>>> blah.py " ... in the Hadoop streaming command.  This doesn't seem to
>>>> work correctly for me, since I find that the lines from the input data
>>>> files are also parsed by the Python interpreter.  But this does reveal
>>>> that python is available on the worker node when using this technique.
>>>>  I have also tried without success the '-mapper blah.py' technique
>>>> using shebang lines: "#!/usr/bin/env python", although on the training
>>>> VM Python is installed under /usr/bin/python.
>>>>
>>>> Maybe the issue is something else.  Any suggestions or insights will be
helpful.
>>>>
>>>
>>
>

Mime
View raw message