hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3131) enabling BLOCK compression for map outputs breaks the reduce progress counters
Date Tue, 10 Jun 2008 23:33:45 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Matei Zaharia updated HADOOP-3131:

    Affects Version/s:     (was: 0.16.1)
               Status: Patch Available  (was: Open)

The problem was that SequenceFile.Sorter.MergeQueue calculates progress as (total size of
keys and values read) / (total size of files to be merged on disk). When a file is compressed,
the file size is much smaller than the combined sizes of the keys and values. In fact, there
is also a problem when compression is turned off - the code returns progress less than 100%
because it does not count bytes in the file that are not part of keys and values, such as
the header and length fields. This patch changes MergeQueue to use the position in the input
stream to calculate number of bytes read from disk and divide that by the total amount of
data to be merged.

> enabling BLOCK compression for map outputs breaks the reduce progress counters
> ------------------------------------------------------------------------------
>                 Key: HADOOP-3131
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3131
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.0, 0.17.1, 0.18.0
>            Reporter: Colin Evans
>         Attachments: Picture 1.png
> Enabling map output compression and setting the compression type to BLOCK causes the
progress counters during the reduce to go crazy and report progress counts over 100%.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message