hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-1264) 0.20: OOME in HDFS client made an unrecoverable HDFS block
Date Wed, 23 Jun 2010 19:50:51 GMT
0.20: OOME in HDFS client made an unrecoverable HDFS block

                 Key: HDFS-1264
                 URL: https://issues.apache.org/jira/browse/HDFS-1264
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: data-node, hdfs client
    Affects Versions: 0.20-append
            Reporter: Todd Lipcon
             Fix For: 0.20-append

Ran into a bad issue in testing overnight. One of the writers experienced an OOME in the middle
of writing a checksum chunk to the stream inside a sync() call. It then proceeded to retry
recovery on each DN in the pipeline, but each recovery failed because its internal checksum
buffer was borked in some way - on the DNs I see "Unexpected checksum mismatch" errors after
each recovery attempt.

When another client tried to recover the file using appendFile, it got the "Partial CRC 3766269197
does not match value computed the  last time file was closed" error (plus there was only one
replica left in targets). It thus failed to set up the append pipeline, and ran into HDFS-1262.

This was on 0.20-append, though it may happen on trunk as well.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message