hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10857) Rolling upgrade can make data unavailable when the cluster has many failed volumes
Date Mon, 24 Oct 2016 21:20:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603256#comment-15603256

Arpit Agarwal commented on HDFS-10857:

Looks like {{checkDiskError}} should get the DataNode object lock for the {{dataDirs}} modification
to avoid a potential race with {{refreshVolumes}}.

> Rolling upgrade can make data unavailable when the cluster has many failed volumes
> ----------------------------------------------------------------------------------
>                 Key: HDFS-10857
>                 URL: https://issues.apache.org/jira/browse/HDFS-10857
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.4
>            Reporter: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-10857.branch-2.6.patch
> When the marker file or trash dir is created or removed during the heartbeat response
processing, an {{IOException}} is thrown if tried on a failed volume.   This stops processing
of the rest of storage directories and any DNA commands that were part of the heartbeat response.
> While this is happening, the block token key update does not happen and all read and
write requests start to fail, until the upgrade is finalized and the DN receives a new key.
All it takes is one failed volume. If there are three such nodes in the cluster, it is very
likely that some blocks cannot be read. The NN has no idea unlike the common missing blocks
scenarios, although the effect is the same.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message