hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Anderson <...@monkey.org>
Subject Re: Difference between Hadoop Streaming and "Normal" mode
Date Wed, 20 Aug 2008 00:51:36 GMT

On 12-Aug-08, at 4:28 PM, lohit wrote:

> To add to this,
> If you use streaming you would be operating on Text fields. If you  
> have sequence files you would have to have your own input format to  
> convert it and deal with the format in your scripts while with java  
> implementation its trivial. There is a performance hit if you use  
> streaming, but other than that you should be able to do most of the  
> stuff. Lot of applications use streaming.

I haven't used this myself yet, but the docs suggest that you can  
change this with command-line parameters to the hadoop job (- 
inputformat, -outputformat).

http://hadoop.apache.org/core/docs/current/streaming.html#Specifying+Other+Plugins+for+Jobs

http://hadoop.apache.org/core/docs/current/streaming.html#How+do+I+provide+my+own+input%2Foutput+format+with+streaming%3F

It looks like you can even write your own streamers (as Java-callable  
classes) if you make your own streaming jar and pack them into it  
(although I haven't tried this either).

Mime
View raw message