hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6964) NN fails to fix under replication leading to data loss
Date Tue, 21 Oct 2014 16:54:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178625#comment-14178625

Kihwal Lee commented on HDFS-6964:

Retargetting for 2.7.0 for now. If the issue is reproduced and found to be critical, we may
pull the fix to 2.6.x.

> NN fails to fix under replication leading to data loss
> ------------------------------------------------------
>                 Key: HDFS-6964
>                 URL: https://issues.apache.org/jira/browse/HDFS-6964
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Priority: Blocker
> We've encountered lost blocks due to node failure even when there is ample time to fix
the under-replication.
> 2 nodes were lost.  The 3rd node with the last remaining replicas averaged 1 copy block
per heartbeat (3s) until ~7h later when that node was lost resulting in over 50 lost blocks.
 When the node was restarted and sent its BR the NN immediately began fixing the replication.
> In another data loss event, over 150 blocks were lost due to node failure but the timing
of the node loss is not known so there may have been inadequate time to fix the under-replication
unlike the first case.

This message was sent by Atlassian JIRA

View raw message