hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: how to pass a hdfs file to a c++ process
Date Tue, 23 Aug 2011 14:48:10 GMT
Hadoop streaming is the simplest way to do this, if you program is set up to take stdin as
its input, write to stdout for the output, and each record "file" in your case is a single
line of text.

You need to be able to have it work with the following shell script

Hadoop fs -cat <input_file> | head -1 | ./myprocess > output.txt

And ideally what is stored in output.txt are lines of text that can have their order rearranged
without impacting the result (This is not a requirement unless you want to use a reduce too,
but streaming will still try to parse it that way.

If not there are tricks you can play to make it work, but they are kind of ugly.

--Bobby Evans

On 8/22/11 2:57 PM, "Zhixuan Zhu" <zzhu@calpont.com> wrote:

Hi All,

I'm using hadoop-0.20.2 to try out some simple tasks. I asked a question
about FileInputFormat a few days ago and get some prompt replys from
this forum and it helped a lot. Thanks again! Now I have another
question. I'm trying to invoke a C++ process from my mapper for each
hdfs file in the input directory to achieve some parallel processing.
But how do I pass the file to the program? I would want to do something
like the following in my mapper:

Process lChldProc = Runtime.getRuntime().exec("myprocess -file

How do I pass the hdfs filesystem to an outside process like that? Is
HadoopStreaming the direction I should go?

Thanks very much for any reply in advance.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message