hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei <faithlessfri...@gmail.com>
Subject Re: How to import custom Python module in MapReduce job?
Date Mon, 12 Aug 2013 12:13:36 GMT
For some reason using -archives option leads to "Error in configuring
object" without any further information. However, I found out that -files
option works pretty well for this purpose. I was able to run my example as

1. I put `main.py` and `lib.py` into `app` directory.
2. In `main.py` I used `lib.py` directly, that is, import string is just

    import lib

3. Instead of uploading to HDFS and using -archives option I just pointed
to `app` directory in -files option:

    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar *-files
app*-mapper "
*app/*main.py map" -reducer "*app/*main.py reduce" -input input -output

It did the trick. Note, that I tested with both - CPython (2.6) and PyPy
(1.9), so I think it's quite safe to assume this way correct for Python

Thanks for your help, Binglin, without it I wouldn't be able to figure it
out anyway.

On Mon, Aug 12, 2013 at 1:12 PM, Binglin Chang <decstery@gmail.com> wrote:

> Maybe you doesn't specify symlink name in you cmd line, so the symlink
> name will be just lib.jar, so I am not sure how you import lib module in
> your main.py file.
> Please try this:
> put main.py lib.py in same jar file, e.g.  app.zip
> *-archives hdfs://hdfs-namenode/user/me/app.zip#app* -mapper "app/main.py
> map" -reducer "app/main.py reduce"
> in main.py:
> import app.lib
> or:
> import .lib

View raw message