hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niels Basjes <Ni...@basjes.nl>
Subject Re: Change proposal for FileInputFormat isSplitable
Date Fri, 30 May 2014 22:02:58 GMT

The way I see the effects of the original patch on existing subclasses:
- implemented isSplitable
   --> no performance difference.
- did not implement isSplitable
   --> then there is no performance difference if the container is either
not compressed or uses a splittable compression.
   --> If it uses a common non splittable compression (like gzip) then the
output will suddenly be different (which is the correct answer) and the
jobs will finish sooner because the input is not processed multiple times.

Where do you see a performance impact?

On May 30, 2014 8:06 PM, "Doug Cutting" <cutting@apache.org> wrote:

> On Thu, May 29, 2014 at 2:47 AM, Niels Basjes <Niels@basjes.nl> wrote:
> > For arguments I still do not fully understand this was rejected by Todd
> and
> > Doug.
> Performance is a part of compatibility.
> Doug

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message