hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saptarshi Guha <saptarshi.g...@gmail.com>
Subject Re: Import path for hadoop streaming with python
Date Fri, 23 May 2008 00:09:09 GMT
I haven't done this using hadoop but before i 16.4 i had written my  
own distributed batch processor using HDFS as a common file storage  
and remote execution of python scripts.
They all required a custom module which was copied to the remote temp  
folders (a primitive implementation of cacheFile)

So this is what I did:  just after #!/usr/bin/env python

import sys
sys.path.append('.')
import mylib
dostuff

so that your module can be found in the current path.
It should work thereafter
Regards
Saptarshi

On May 22, 2008, at 7:39 PM, Martin Blom wrote:

> Hello all,
>
> I'm trying to stream a little python script on my small hadoop
> cluster, and it doesn't work like I thought it would.
>
> The script looks something like
>
> #!/usr/bin/env python
> import mylib
> dostuff
>
> where mylib is a small python library that I want included, and I
> launch the whole thing with something like
>
> bin/hadoop jar contrib/streaming/hadoop-0.16.4-streaming.jar
> -cacheFile "hdfs://master:54310/user/hadoop/mylib.py#mylib.py" -file
> scrpit.py -mapper "script.py" -input input -output output
>




> so it seems to me like the library should be available to the script.
> When I run the script locally on my machine everything works perfectly
> fine. However, when I run it it the script can't find the library.
> Does hadoop do anything strange to default paths? Am I missing
> something obvious? Any pointers or ideas on how to fix this would be
> great.
>
> Martin Blom

Saptarshi Guha | saptarshi.guha@gmail.com | http://www.stat.purdue.edu/~sguha
You love your home and want it to be beautiful.


Mime
View raw message