hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <o...@yahoo-inc.com>
Subject Re: Global information in mapreduce
Date Tue, 20 Mar 2007 15:52:43 GMT

On Mar 19, 2007, at 10:08 PM, Alejandro Abdelnur wrote:

> you could write your word set to a file in DFS somewhere outside of
> the input directory and read it at map init time (within the
> configure() method). you could pass the path to file as a
> configuration property.

On a side node, if the files are large (or the maps short) it can  
make sense to use the local file cache. See  
org.apache.hadoop.filecache.DistributedCache. In particular, look at  
setCacheFiles. Basically, you configure it with a url and the task  
tracker will copy an instance down and cache it locally.

-- Owen

View raw message