hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Hadoop streaming or pipes ..
Date Thu, 05 Apr 2012 21:00:16 GMT
It is a regular process, unless you explicitly say you want it to be java, which would be a
bit odd to do, but possible.


On 4/5/12 3:14 PM, "Mark question" <markq2011@gmail.com> wrote:

Thanks for the response Robert ..  so the overhead will be in read/write
and communication. But is the new process spawned a JVM or a regular


On Thu, Apr 5, 2012 at 12:49 PM, Robert Evans <evans@yahoo-inc.com> wrote:

> Both streaming and pipes do very similar things.  They will fork/exec a
> separate process that is running whatever you want it to run.  The JVM that
> is running hadoop then communicates with this process to send the data over
> and get the processing results back.  The difference between streaming and
> pipes is that streaming uses stdin/stdout for this communication so
> preexisting processing like grep, sed and awk can be used here.  Pipes uses
> a custom protocol with a C++ library to communicate.  The C++ library is
> tagged with SWIG compatible data so that it can be wrapped to have APIs in
> other languages like python or perl.
> I am not sure what the performance difference is between the two, but in
> my own work I have seen a significant performance penalty from using either
> of them, because there is a somewhat large overhead of sending all of the
> data out to a separate process just to read it back in again.
> --Bobby Evans
> On 4/5/12 1:54 PM, "Mark question" <markq2011@gmail.com> wrote:
> Hi guys,
>  quick question:
>   Are there any performance gains from hadoop streaming or pipes over
> Java? From what I've read, it's only to ease testing by using your favorite
> language. So I guess it is eventually translated to bytecode then executed.
> Is that true?
> Thank you,
> Mark

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message