hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <jezh...@gopivotal.com>
Subject Re: Logic of isSplittable() of class FileInputFormat
Date Wed, 26 Feb 2014 08:36:55 GMT
Hi Sugandha,

Take gz file as an example, It is not splittable because of the compression
algorithm it is used.  It can not guarantee that one record is located in
one block, if one record is in 2 blocks, your program will crash since you
can not get the whole record.

On Wed, Feb 26, 2014 at 1:24 PM, Sugandha Naolekar

> Hello,
> If a single file is split of size 129 MB is split in two halves/blocks of
> HDFS as the max block size id 128 MB. And each of the blocks is read
> depending on the InputFormat it supports. Thus, what is the significance of
> isSplittable() method then?
> If it is set to false, entire block will be considered as single input
> split? How will TextInputFormat react to it?
> --
> Thanks & Regards,
> Sugandha Naolekar

View raw message