hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
Date Thu, 03 Aug 2017 23:03:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113651#comment-16113651

Andrew Wang commented on HDFS-8893:

Ping since this one is still on the critical list. Any progress Rushabh, Daryn?

> DNs with failed volumes stop serving during rolling upgrade
> -----------------------------------------------------------
>                 Key: HDFS-8893
>                 URL: https://issues.apache.org/jira/browse/HDFS-8893
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Rushabh S Shah
>            Assignee: Daryn Sharp
>            Priority: Critical
> When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker to each
of their volumes. If one of the volumes is bad, this will fail. When this failure happens,
the DN does not update the key it received from the NN.
> Unfortunately we had one failed volume on all the 3 datanodes which were having replica.
> Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the DNs with
failed volumes will stop serving clients.
> Here is the stack trace on the datanode size:
> {noformat}
> 2015-08-11 07:32:28,827 [DataNode: heartbeating to <nn1>8020] WARN datanode.DataNode:
IOException in offerService
> java.io.IOException: Read-only file system
>         at java.io.UnixFileSystem.createFileExclusively(Native Method)
>         at java.io.File.createNewFile(File.java:947)
>         at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721)
>         at org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357)
>         at org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833)
>         at java.lang.Thread.run(Thread.java:722)
> {noformat}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message