hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dieter De Witte <drdwi...@gmail.com>
Subject Re: Logic of isSplittable() of class FileInputFormat
Date Wed, 26 Feb 2014 11:04:58 GMT
if you have a simple one line record format you should allow files to be
splitted, since your simulations will be better balanced.


2014-02-26 11:31 GMT+01:00 Sugandha Naolekar <sugandha.n87@gmail.com>:

> Oh. Ok. Thanks. So basically, to be on the safer side, one can always set
> its value as false and keep the data of records consistent. I mean, the
> length of all the records should be the same.
>
> --
> Thanks & Regards,
> Sugandha Naolekar
>
>
>
>
>
> On Wed, Feb 26, 2014 at 3:57 PM, Dieter De Witte <drdwitte@gmail.com>wrote:
>
>> No, an example could be that records have a variable number of lines, if
>> you would then allow a file to be split your record may be broken, so then
>> you could override isSplittable to be always false.
>>
>>
>> 2014-02-26 11:22 GMT+01:00 Sugandha Naolekar <sugandha.n87@gmail.com>:
>>
>> So basically what I can deduce from it is, isSplittable() only applies to
>>> stream compressed files. Right?
>>>
>>> --
>>> Thanks & Regards,
>>> Sugandha Naolekar
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Feb 26, 2014 at 2:06 PM, Jeff Zhang <jezhang@gopivotal.com>wrote:
>>>
>>>> Hi Sugandha,
>>>>
>>>> Take gz file as an example, It is not splittable because of the
>>>> compression algorithm it is used.  It can not guarantee that one record is
>>>> located in one block, if one record is in 2 blocks, your program will crash
>>>> since you can not get the whole record.
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Feb 26, 2014 at 1:24 PM, Sugandha Naolekar <
>>>> sugandha.n87@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> If a single file is split of size 129 MB is split in two halves/blocks
>>>>> of HDFS as the max block size id 128 MB. And each of the blocks is read
>>>>> depending on the InputFormat it supports. Thus, what is the significance
of
>>>>> isSplittable() method then?
>>>>>
>>>>> If it is set to false, entire block will be considered as single input
>>>>> split? How will TextInputFormat react to it?
>>>>>
>>>>>
>>>>> --
>>>>> Thanks & Regards,
>>>>> Sugandha Naolekar
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message