hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: Control the file splits size
Date Mon, 23 Aug 2010 19:32:41 GMT
Ah yes I overlooked that part, sorry. I haven't tried out custom
splits yet, so can't comment further on what may be going down.

On Tue, Aug 24, 2010 at 12:44 AM, Michael Segel
<michael_segel@hotmail.com> wrote:
>
>
> Uhm...
>
> There may be more to the initial question.
>
> The OP indicated that this was a 'binary file' and that the records may not be based
on an end-of-line.
> So he may want to look at how to handle different types of input too.
>
>
>> From: qwertymaniac@gmail.com
>> Date: Mon, 23 Aug 2010 18:39:48 +0530
>> Subject: Re: Control the file splits size
>> To: common-user@hadoop.apache.org
>>
>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html#isSplitable(org.apache.hadoop.fs.FileSystem,
>> org.apache.hadoop.fs.Path)
>>
>> The isSplitable is the method you're looking for -- return false for
>> this in your custom input format (derived from FIF or etc.).
>>
>> On Mon, Aug 23, 2010 at 4:08 PM, Teodor Macicas <teodor.macicas@epfl.ch> wrote:
>> > Hi all,
>> >
>> > Can anyone please tell me how to control the splits size ? I have one big
>> > file which will be splitted by the number of maps. The input file is binary
>> > and contains some objects. I do not want to split an object into 2 separate
>> > files, for sure.
>> > I overwrite the computeSplitSize() file and I forced the size to be a
>> > multiple of my objects size. It worked, but it seems that on certain points
>> > of the output file objects are missing. And now I am thinking that this
>> > could be my problem.
Your output file is a result of MR if am correct? Can you verify at
the input of your mapper if your objects are being read properly based
on the split you've computed for it?
>> >
>> > Have anyone faced this problem before ?
>> >
>> > Thank you.
>> > Regards,
>> > Teodor
>> >
>>
>>
>>
>> --
>> Harsh J
>> www.harshj.com
>



-- 
Harsh J
www.harshj.com

Mime
View raw message