hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3368) Missing blocks due to bad DataNodes comming up and down.
Date Fri, 11 May 2012 21:05:49 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273617#comment-13273617

Suresh Srinivas commented on HDFS-3368:

Sorry, I should have added more details to my comment:
In your description of the problem first the failure is one by one - "At different times all
three nodes malfunctioned and died, causing the replicas to migrate to dn1, dn2, dn3." Later
the failure is together in a short time "Expectedly do1, do2, do3 malfunction again and go
down shortly after reporting their blocks to NN".

While you change how you choose the replicas to delete, the presence of nodes like do1, do2
and do3 means that the following scenario is possible:
* d01, do2, do3 are chosen for adding new block.
* client adds a block to these nodes.
* shortly all do1, do2, do3 go down shortly.
Now the replicas are no longer available.

HDFS multiple replicas assumes the probability of three nodes having same replicas going down
altogether in a short time is low. Given that not sure if this problem is important enough.

Alternatively, given block placement policy is pluggable, you could write a custom implementation
and not change the default implementation?

> Missing blocks due to bad DataNodes comming up and down.
> --------------------------------------------------------
>                 Key: HDFS-3368
>                 URL: https://issues.apache.org/jira/browse/HDFS-3368
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0, 1.0.0, 2.0.0, 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>         Attachments: blockDeletePolicy-0.22.patch, blockDeletePolicy-trunk.patch, blockDeletePolicy.patch
> All replicas of a block can be removed if bad DataNodes come up and down during cluster
restart resulting in data loss.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message