hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gera Shegalov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk
Date Mon, 01 Dec 2014 05:32:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229432#comment-14229432

Gera Shegalov commented on MAPREDUCE-6166:

Thanks for commenting [~eepayne]!
bq. Since OnDiskMapOutput is shuffling the whole IFile to disk, the checksum is needed later
during the last merge pass when the IFile contents are read again and decompressed.

Can you clarify where in the code it's required to keep the original checksum?

What I see is that after your modifications, {{OnDiskMapOutput}} is guaranteed to validate
the contents of the destination buffer against the remote checksum. Then this contents are
written out using {{LocalFileSystem}}, which will create again an on-disk checksum because
it's based on {{ChecksumFileSystem}}. Are you proposing an optimization that the checksum
is not computed twice when shuffling straight to disk by using {{RawLocalFileSystem}}? Can
we defer it to another JIRA?

> Reducers do not catch bad map output transfers during shuffle if data shuffled directly
to disk
> -----------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-6166
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.6.0
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>         Attachments: MAPREDUCE-6166.v1.201411221941.txt, MAPREDUCE-6166.v2.201411251627.txt
> In very large map/reduce jobs (50000 maps, 2500 reducers), the intermediate map partition
output gets corrupted on disk on the map side. If this corrupted map output is too large to
shuffle in memory, the reducer streams it to disk without validating the checksum. In jobs
this large, it could take hours before the reducer finally tries to read the corrupted file
and fails. Since retries of the failed reduce attempt will also take hours, this delay in
discovering the failure is multiplied greatly.

This message was sent by Atlassian JIRA

View raw message