hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Isaacson <...@cloudera.com>
Subject Re: python streaming error
Date Mon, 14 Jan 2013 22:24:30 GMT
Oh, another link I should have included!
http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/

-andy

On Mon, Jan 14, 2013 at 2:19 PM, Andy Isaacson <adi@cloudera.com> wrote:
> Hadoop Streaming does not magically teach Python open() how to read
> from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
> -cat" to read the file for you.
>
> A few links that may help:
>
> http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
> http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs
> https://bitbucket.org/turnaev/cyhdfs
>
> -andy
>
> On Sat, Jan 12, 2013 at 12:30 AM, springring <springring@126.com> wrote:
>> Hi,
>>
>>      When I run code below as a streaming, the job error N/A and killed.  I run step
by step, find it error when
>> " file_obj = open(file) " .  When I run same code outside of hadoop, everything is
ok.
>>
>>   1 #!/bin/env python
>>   2
>>   3 import sys
>>   4
>>   5 for line in sys.stdin:
>>   6     offset,filename = line.split("\t")
>>   7     file = "hdfs://user/hdfs/catalog3/" + filename
>>   8     print line
>>   9     print filename
>>  10     print file
>>  11     file_obj = open(file)
>> ..................................
>>

Mime
View raw message