hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sugandha Naolekar <sugandha....@gmail.com>
Subject Re: Logic of isSplittable() of class FileInputFormat
Date Wed, 26 Feb 2014 10:22:54 GMT
So basically what I can deduce from it is, isSplittable() only applies to
stream compressed files. Right?

--
Thanks & Regards,
Sugandha Naolekar





On Wed, Feb 26, 2014 at 2:06 PM, Jeff Zhang <jezhang@gopivotal.com> wrote:

> Hi Sugandha,
>
> Take gz file as an example, It is not splittable because of the
> compression algorithm it is used.  It can not guarantee that one record is
> located in one block, if one record is in 2 blocks, your program will crash
> since you can not get the whole record.
>
>
>
>
> On Wed, Feb 26, 2014 at 1:24 PM, Sugandha Naolekar <sugandha.n87@gmail.com
> > wrote:
>
>> Hello,
>>
>> If a single file is split of size 129 MB is split in two halves/blocks of
>> HDFS as the max block size id 128 MB. And each of the blocks is read
>> depending on the InputFormat it supports. Thus, what is the significance of
>> isSplittable() method then?
>>
>> If it is set to false, entire block will be considered as single input
>> split? How will TextInputFormat react to it?
>>
>>
>> --
>> Thanks & Regards,
>> Sugandha Naolekar
>>
>>
>>
>>
>

Mime
View raw message