hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John DeTreville" <...@yahoo-inc.com>
Subject RE: Difference between Hadoop Streaming and "Normal" mode
Date Tue, 12 Aug 2008 22:33:57 GMT
I think you will find that the Streaming model buys you convenience,
but costs you performance and generality. I'll let others quantify
how much of each.

Cheers,
John

-----Original Message-----
From: vedagaurav@gmail.com [mailto:vedagaurav@gmail.com] On Behalf Of
Gaurav Veda
Sent: Tuesday, August 12, 2008 3:10 PM
To: core-user@hadoop.apache.org
Subject: Difference between Hadoop Streaming and "Normal" mode

Hi All,

This might seem too silly, but I couldn't find a satisfactory answer
to this yet. What are the advantages / disadvantages of using Hadoop
Streaming over the normal mode (wherein you write your own mapper and
reducer in Java)? From what I gather, the real advantage of Hadoop
Streaming is that you can use any executable (in c / perl / python
etc) as a mapper / reducer.
A slight disadvantage is that the default is to read (write) from the
standard input (output) ... though one can specify their own Input and
Output format (and package it with the default hadoop streaming jar
file).

My point is, why should I ever use the normal mode? Streaming seems
just as good. Is there a performance problem or do I have only limited
control over my job if I use the streaming mode or some other issue?

Thanks!
Gaurav
-- 
Share what you know, learn what you don't !

Mime
View raw message