hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Roelofs <roel...@yahoo-inc.com>
Subject Re: concatenated gzip support: default on or not?
Date Thu, 17 Jun 2010 20:52:31 GMT
> As some folks have found out the hard way, only the first member of a
> concatenated gzip file is recognized by current versions of Hadoop,
> including trunk; the remainder is silently ignored.  I'm working on
> the fix (MAPREDUCE-469), and the question has come up whether to make
> the fixed version the default, which would represent a behavior change.

> So, three options:

> (1) configurable; concatenation support not enabled by default
> (2) configurable; concatenation support enabled by default (behavior change)
> (3) not configurable; concatenation support always enabled (behavior change)

Not a vast amount of feedback, but the consensus is clearly for enabling
concatenation support by default, and there doesn't even seem to be any
real interest in making it configurable.

So the next version of the patch (incorporating informal review feedback
and dealing with most of my own FIXMEs) will go with (2) just because it's
trivial to do so, but if I don't hear any arguments against by, say, early
next week, the subsequent (final?) version will hardcode it (option (3)) to
match the bzip2 behavior.


View raw message