hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Roelofs (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1795) add error option if file-based record-readers fail to consume all input (e.g., concatenated gzip, bzip2)
Date Tue, 18 May 2010 01:28:43 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Greg Roelofs updated MAPREDUCE-1795:
------------------------------------

     Original Estimate: 336h
    Remaining Estimate: 336h
     Affects Version/s: 0.20.2
           Description: 
When running MapReduce with concatenated gzip files as input, only the first part ("member"
in gzip spec parlance, http://www.ietf.org/rfc/rfc1952.txt) is read; the remainder is silently
ignored.  As a first step toward fixing that, this issue will add a configurable option to
throw an error in such cases.

MAPREDUCE-469 is the tracker for the more complete fix/feature, whenever that occurs.

  was:
When running MapReduce with concatenated gzip files as input only the first part is read,
which is confusing, to say the least. Concatenated gzip is described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage
and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)



> add error option if file-based record-readers fail to consume all input (e.g., concatenated
gzip, bzip2)
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1795
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1795
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: Greg Roelofs
>            Assignee: Ravi Gummadi
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> When running MapReduce with concatenated gzip files as input, only the first part ("member"
in gzip spec parlance, http://www.ietf.org/rfc/rfc1952.txt) is read; the remainder is silently
ignored.  As a first step toward fixing that, this issue will add a configurable option to
throw an error in such cases.
> MAPREDUCE-469 is the tracker for the more complete fix/feature, whenever that occurs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message