hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <ja...@attributor.com>
Subject Re: Is it possible to input two different files under same mapper
Date Mon, 14 Jul 2008 13:47:52 GMT
This sounds like a good task for the Data Join code.
If you can set up so that all of your data is stored in MapFiles, with 
the same type of key and the same partitioning setup and count, it will 
go very well.

Mori Bellamy wrote:
> Hey Amer,
> It sounds to me like you're going to have to write your own input 
> format (or atleast modify an existing one). Take a look here:
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileSplit.html

> I'm not sure how you'd go about doing this, but i hope this helps you.
> (Also, have you considered preprocessing your input so that any 
> arbitrary mapper can know whether or not its looking at a line from 
> the "large file"?)
> On Jul 11, 2008, at 12:31 PM, Muhammad Ali Amer wrote:
>> HI,
>> My requirement is to compare the contents of one very large file (GB 
>> to TB size) with a bunch of smaller files (100s of MB to GB  sizes). 
>> Is there a way I can give the mapper the 1st file independently of 
>> the remaining bunch?
>> Amer
Jason Venner
Attributor - Program the Web <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers and coding wizards, contact if 

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message