hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mori Bellamy <mbell...@apple.com>
Subject Re: Is it possible to input two different files under same mapper
Date Fri, 11 Jul 2008 20:41:50 GMT
Hey Amer,
It sounds to me like you're going to have to write your own input  
format (or atleast modify an existing one). Take a look here:
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileSplit.html

I'm not sure how you'd go about doing this, but i hope this helps you.

(Also, have you considered preprocessing your input so that any  
arbitrary mapper can know whether or not its looking at a line from  
the "large file"?)
On Jul 11, 2008, at 12:31 PM, Muhammad Ali Amer wrote:

> HI,
> My requirement is to compare the contents of one very large file (GB  
> to TB size) with a bunch of smaller files (100s of MB to GB  sizes).  
> Is there a way I can give the mapper the 1st file independently of  
> the remaining bunch?
> Amer


Mime
View raw message