hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <omal...@apache.org>
Subject Re: Passing information to Map Reduce
Date Fri, 13 Aug 2010 22:15:26 GMT

On Aug 13, 2010, at 12:55 PM, Pete Tyler wrote:

> I have only found two options, neither of which I really like,
> 1. Encode information in the job name string - a bit hokey and  
> limited to strings

I'd state this as encode the information into a string and add it to  
the JobConf. Look at the Base64 class if you want to uuencode your  
data. This is easiest, but causes problems if the JobConf gets much  
above 2MB or so.

> 2. Persist the information, which changes from job to job - if every  
> map task and every reduce task has to read one piece if specific,  
> persisted data that may be stored on another node won't this have  
> significant performance implications?

This is generally the preferred strategy. In particular, the framework  
supports the "distributed cache" which will cause files from HDFS to  
be downloaded to each node before the tasks run. The files will only  
be downloaded once for each node. Files in the distributed cache can  
be a couple GB without huge performance problems.

-- Owen

View raw message