hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <jason.had...@gmail.com>
Subject Re: how to set one map task for each input key-value pair
Date Sat, 28 Nov 2009 17:51:05 GMT
I don't have direct experience with setMaxInputSplitSize. The process sounds
feasible.


On Fri, Nov 27, 2009 at 4:34 AM, Upendra Dadi <udadi@gmu.edu> wrote:

> Thank you Jason! How about if I fix the size of each record to the size of
> the largest record by adding dummy characters to the rest of the records and
> then set the setMaxInputSplitSize() and setMinInputSplitSize() of
> FileInputFormat class to this value? The mapper will extract the input after
> ignoring the dummy characters. Do you think this could work? Thanks.
>
> Regards,
> Upendra
>
>
> ----- Original Message ----- From: "Jason Venner" <jason.hadoop@gmail.com>
> To: <common-dev@hadoop.apache.org>
> Sent: Friday, November 27, 2009 12:06 AM
> Subject: Re: how to set one map task for each input key-value pair
>
>
>
>  The only thing that comes immediately to mind is to write your own custom
>> input format that knows how to tell where the boundaries are in your data
>> set, and uses those to specify the beginning and end of the input splits.
>>
>> You can also tell the framework not to split your individual input files
>> by
>> setting the minimum input split size (mapred.min.split.size) to
>> Long.MAX_VALUE
>>
>> On Thu, Nov 26, 2009 at 4:53 PM, Upendra Dadi <udadi@gmu.edu> wrote:
>>
>>  Hi,
>>>  I am trying to use MapReduce with some scientific data. I have key-value
>>> pairs such that the size of the value can range from few megabytes to
>>> several hundreds of megabytes. What happens when the size of the value
>>> exceeds block size? How do I set it up so that each key-value pair is
>>> associated with a seperate map? Please some one help. Thanks.
>>>
>>> Regards,
>>> Upendra
>>>
>>>
>>
>>
>> --
>> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
>> http://www.amazon.com/dp/1430219424?tag=jewlerymall
>> www.prohadoopbook.com a community for Hadoop Professionals
>>
>>
>


-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message