hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim" <tcsiw...@dons.usfca.edu>
Subject Re: Change proposal for FileInputFormat isSplitable
Date Fri, 30 May 2014 22:44:17 GMT
Remove

On Fri, May 30, 2014 at 3:03 PM, Niels Basjes <Niels@basjes.nl> wrote:

> Hi,
> The way I see the effects of the original patch on existing subclasses:
> - implemented isSplitable
>    --> no performance difference.
> - did not implement isSplitable
>    --> then there is no performance difference if the container is either
> not compressed or uses a splittable compression.
>    --> If it uses a common non splittable compression (like gzip) then the
> output will suddenly be different (which is the correct answer) and the
> jobs will finish sooner because the input is not processed multiple times.
> Where do you see a performance impact?
> Niels
> On May 30, 2014 8:06 PM, "Doug Cutting" <cutting@apache.org> wrote:
>> On Thu, May 29, 2014 at 2:47 AM, Niels Basjes <Niels@basjes.nl> wrote:
>> > For arguments I still do not fully understand this was rejected by Todd
>> and
>> > Doug.
>>
>> Performance is a part of compatibility.
>>
>> Doug
>>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message