hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: executing linux command from hadoop (python)
Date Fri, 16 Aug 2013 03:33:50 GMT
Yes it would work with streaming, but note that if your os.system(…)
call produces any stdout prints, they are treated as task output and
are sent to HDFS/Reducers.

P.s. I assume the example you've produced is naive but if it is not,
re-consider appending all those strings together. You don't want to be
holding so much data in memory when run over large files, and nor
would a command support lengths as long as, say, a 64 MB input block.

On Fri, Aug 16, 2013 at 4:53 AM, jamal sasha <jamalshasha@gmail.com> wrote:
> Hi,
>  Lets say that I have a data which interacts with a rest api like
> %curl hostname data
> Now, I have the following script:
> #!/usr/bin/env python
> import sys,os
> cmd = """curl http://localhost  --data  '"""
> string = " "
> for line in sys.stdin:
>     line = line.rstrip(os.linesep)
>     string += line
> os.system(cmd + string+"'")
> Now, if i give a sample file for data, and run the above script with
> cat data.txt | python mapper.py
> It works perfectly. But will this work if i execute on hadoop as well?
> I am trying to set up hadoop on local mode to check it out but I think it
> will take me sometime to get there?
> Any experiences, suggestions?
> Thanks

Harsh J

View raw message