hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ricky Ho <...@adobe.com>
Subject How many people is using Hadoop Streaming ?
Date Fri, 03 Apr 2009 16:42:38 GMT
Has anyone benchmark the performance difference of using Hadoop ?
  1) Java vs C++
  2) Java vs Streaming

>From looking at the Hadoop architecture, since TaskTracker will fork a separate process
anyway to run the user supplied map() and reduce() function, I don't see the performance overhead
of using Hadoop Streaming (of course the efficiency of the chosen script will be a factor
but I think this is orthogonal).  On the other hand, I see a lot of benefits of using Streaming,
including ...

  1) I can pick the language that offers a different programming paradigm (e.g. I may choose
functional language, or logic programming if they suit the problem better).  In fact, I can
even chosen Erlang at the map() and Prolog at the reduce().  Mix and match can optimize me
  2) I can pick the language that I am familiar with, or one that I like.
  3) Easy to switch to another language in a fine-grain incremental way if I choose to do
so in future.

Even if I am a Java programmer, I still can write a Main() method to take the standard in
and standard out data and I don't see I am losing much by doing that.  The benefit is my code
can be easily moved to another language in future.

Am I missing something here ?  or is the majority of Hadoop applications written in Hadoop
Streaming ?


View raw message