hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upendra Dadi <ud...@gmu.edu>
Subject Re: how to set one map task for each input key-value pair
Date Fri, 27 Nov 2009 12:34:31 GMT
Thank you Jason! How about if I fix the size of each record to the size of 
the largest record by adding dummy characters to the rest of the records and 
then set the setMaxInputSplitSize() and setMinInputSplitSize() of 
FileInputFormat class to this value? The mapper will extract the input after 
ignoring the dummy characters. Do you think this could work? Thanks.


----- Original Message ----- 
From: "Jason Venner" <jason.hadoop@gmail.com>
To: <common-dev@hadoop.apache.org>
Sent: Friday, November 27, 2009 12:06 AM
Subject: Re: how to set one map task for each input key-value pair

> The only thing that comes immediately to mind is to write your own custom
> input format that knows how to tell where the boundaries are in your data
> set, and uses those to specify the beginning and end of the input splits.
> You can also tell the framework not to split your individual input files 
> by
> setting the minimum input split size (mapred.min.split.size) to
> On Thu, Nov 26, 2009 at 4:53 PM, Upendra Dadi <udadi@gmu.edu> wrote:
>> Hi,
>>  I am trying to use MapReduce with some scientific data. I have key-value
>> pairs such that the size of the value can range from few megabytes to
>> several hundreds of megabytes. What happens when the size of the value
>> exceeds block size? How do I set it up so that each key-value pair is
>> associated with a seperate map? Please some one help. Thanks.
>> Regards,
>> Upendra
> -- 
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals

View raw message