hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sugandha Naolekar <sugandha....@gmail.com>
Subject Re: Logic of isSplittable() of class FileInputFormat
Date Wed, 26 Feb 2014 10:31:44 GMT
Oh. Ok. Thanks. So basically, to be on the safer side, one can always set
its value as false and keep the data of records consistent. I mean, the
length of all the records should be the same.

--
Thanks & Regards,
Sugandha Naolekar





On Wed, Feb 26, 2014 at 3:57 PM, Dieter De Witte <drdwitte@gmail.com> wrote:

> No, an example could be that records have a variable number of lines, if
> you would then allow a file to be split your record may be broken, so then
> you could override isSplittable to be always false.
>
>
> 2014-02-26 11:22 GMT+01:00 Sugandha Naolekar <sugandha.n87@gmail.com>:
>
> So basically what I can deduce from it is, isSplittable() only applies to
>> stream compressed files. Right?
>>
>> --
>> Thanks & Regards,
>> Sugandha Naolekar
>>
>>
>>
>>
>>
>> On Wed, Feb 26, 2014 at 2:06 PM, Jeff Zhang <jezhang@gopivotal.com>wrote:
>>
>>> Hi Sugandha,
>>>
>>> Take gz file as an example, It is not splittable because of the
>>> compression algorithm it is used.  It can not guarantee that one record is
>>> located in one block, if one record is in 2 blocks, your program will crash
>>> since you can not get the whole record.
>>>
>>>
>>>
>>>
>>> On Wed, Feb 26, 2014 at 1:24 PM, Sugandha Naolekar <
>>> sugandha.n87@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> If a single file is split of size 129 MB is split in two halves/blocks
>>>> of HDFS as the max block size id 128 MB. And each of the blocks is read
>>>> depending on the InputFormat it supports. Thus, what is the significance
of
>>>> isSplittable() method then?
>>>>
>>>> If it is set to false, entire block will be considered as single input
>>>> split? How will TextInputFormat react to it?
>>>>
>>>>
>>>> --
>>>> Thanks & Regards,
>>>> Sugandha Naolekar
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message