hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Executing a Python program inside Map Function
Date Sat, 26 Jan 2013 16:00:53 GMT
Java provides the Process class to help you launch and read/write
from/to processes:
http://docs.oracle.com/javase/6/docs/api/java/lang/Process.html. You
can use this to spawn your program from your code, to write input into
the process's stdin, and to read its output via its stdout/etc.. The
hadoop-streaming parts of Apache Hadoop is very similar in its
operations - but allows little control back on the launched java map
class which you seem to require.

The tasks (both M and R types) provide entry and exit API points
(configure()/setup(), and cleanup()) - allowing you to spawn a process
before map-reads start, and end it after, letting you manage your
spawned process more cleanly.

On Sat, Jan 26, 2013 at 12:40 PM, Sundeep Kambhampati
<kambhamp@cse.ohio-state.edu> wrote:
> Is it possible to run a python script inside a Map function which is in
> java?
> I what to to run a python script which is on my local disk and I want to use
> the output of that script for further processing in Map Function to produce
> <key/Value> Pairs.
> Can some give me some idea how to do it.
> Regards
> Sundeep

Harsh J

View raw message