hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Preethi Vinayak Ponangi <vinayakpona...@gmail.com>
Subject Re: Executing a Python program inside Map Function
Date Sat, 26 Jan 2013 16:55:07 GMT
It is possible to run a python script from your map function, just make
sure the script is available in your DistributedCache.

I think you are missing something while designing such a job. You are
assuming that your file size is small enough where you can run this script
on your local file system and use the processed output in your Hadoop job.
But what if the local file size increases significantly in a few days? In
that case, you might actually be better off using this python script as a
part of Hadoop Streaming. Stream the python script through the Streaming
API to get the benefit of distributed processing.

Hope this helps.

Vinayak.

On Sat, Jan 26, 2013 at 1:10 AM, Sundeep Kambhampati <
kambhamp@cse.ohio-state.edu> wrote:

> Is it possible to run a python script inside a Map function which is in
> java?
>
> I what to to run a python script which is on my local disk and I want to
> use the output of that script for further processing in Map Function to
> produce <key/Value> Pairs.
> Can some give me some idea how to do it.
>
>
> Regards
> Sundeep
>

Mime
View raw message