hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niels Basjes (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-7076) Splittable Gzip
Date Thu, 23 Dec 2010 10:11:02 GMT
Splittable Gzip

                 Key: HADOOP-7076
                 URL: https://issues.apache.org/jira/browse/HADOOP-7076
             Project: Hadoop Common
          Issue Type: New Feature
          Components: io
            Reporter: Niels Basjes

Files compressed with the gzip codec are not splittable due to the nature of the codec.
This limits the options you have scaling out when reading large gzipped input files.

Given the fact that gunzipping a 1GiB file usually takes only 2 minutes I figured that for
some use cases wasting some resources may result in a shorter job time under certain conditions.
So reading the entire input file from the start for each split (wasting resources!!) may lead
to additional scalability.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message