hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dieter De Witte <drdwi...@gmail.com>
Subject Re: Logic of isSplittable() of class FileInputFormat
Date Wed, 26 Feb 2014 10:27:33 GMT
No, an example could be that records have a variable number of lines, if
you would then allow a file to be split your record may be broken, so then
you could override isSplittable to be always false.


2014-02-26 11:22 GMT+01:00 Sugandha Naolekar <sugandha.n87@gmail.com>:

> So basically what I can deduce from it is, isSplittable() only applies to
> stream compressed files. Right?
>
> --
> Thanks & Regards,
> Sugandha Naolekar
>
>
>
>
>
> On Wed, Feb 26, 2014 at 2:06 PM, Jeff Zhang <jezhang@gopivotal.com> wrote:
>
>> Hi Sugandha,
>>
>> Take gz file as an example, It is not splittable because of the
>> compression algorithm it is used.  It can not guarantee that one record is
>> located in one block, if one record is in 2 blocks, your program will crash
>> since you can not get the whole record.
>>
>>
>>
>>
>> On Wed, Feb 26, 2014 at 1:24 PM, Sugandha Naolekar <
>> sugandha.n87@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> If a single file is split of size 129 MB is split in two halves/blocks
>>> of HDFS as the max block size id 128 MB. And each of the blocks is read
>>> depending on the InputFormat it supports. Thus, what is the significance of
>>> isSplittable() method then?
>>>
>>> If it is set to false, entire block will be considered as single input
>>> split? How will TextInputFormat react to it?
>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Sugandha Naolekar
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message