hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet
Date Mon, 04 Apr 2016 14:58:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224294#comment-15224294

Kihwal Lee commented on HDFS-10178:

[~vinayrpet], sorry, I gave a confusing description of the problem. I was mixing the meta
file header and the non-payload protobuf fields.  After a connection is made and the command
is parsed, a {{BlockReceiver}} is created and {{createRbw()}} is called before getting to
the packet. It creates a meta file with header only. If this is used for transferring the
block, the checksum type is lost.

> Permanent write failures can happen if pipeline recoveries occur for the first packet
> -------------------------------------------------------------------------------------
>                 Key: HDFS-10178
>                 URL: https://issues.apache.org/jira/browse/HDFS-10178
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, HDFS-10178.v3.patch, HDFS-10178.v4.patch
> We have observed that write fails permanently if the first packet doesn't go through
properly and pipeline recovery happens. If the packet header is sent out, but the data portion
of the packet does not reach one or more datanodes in time, the pipeline recovery will be
done against the 0-byte partial block.  
> If additional datanodes are added, the block is transferred to the new nodes.  After
the transfer, each node will have a meta file containing the header and 0-length data block
file. The pipeline recovery seems to work correctly up to this point, but write fails when
actual data packet is resent. 

This message was sent by Atlassian JIRA

View raw message