Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3127118624 for ; Mon, 4 Apr 2016 20:31:27 +0000 (UTC) Received: (qmail 47118 invoked by uid 500); 4 Apr 2016 20:31:26 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 47067 invoked by uid 500); 4 Apr 2016 20:31:26 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 47019 invoked by uid 99); 4 Apr 2016 20:31:26 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Apr 2016 20:31:26 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9435D2C1F5D for ; Mon, 4 Apr 2016 20:31:26 +0000 (UTC) Date: Mon, 4 Apr 2016 20:31:26 +0000 (UTC) From: "Rushabh S Shah (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224993#comment-15224993 ] Rushabh S Shah commented on HDFS-8893: -------------------------------------- This is still a bug. Don't have enough cycles to work on this. Moving it to next release. Hopefully will fix it by then. > DNs with failed volumes stop serving during rolling upgrade > ----------------------------------------------------------- > > Key: HDFS-8893 > URL: https://issues.apache.org/jira/browse/HDFS-8893 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Rushabh S Shah > Assignee: Daryn Sharp > Priority: Critical > > When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker to each of their volumes. If one of the volumes is bad, this will fail. When this failure happens, the DN does not update the key it received from the NN. > Unfortunately we had one failed volume on all the 3 datanodes which were having replica. > Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the DNs with failed volumes will stop serving clients. > Here is the stack trace on the datanode size: > {noformat} > 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN datanode.DataNode: IOException in offerService > java.io.IOException: Read-only file system > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:947) > at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) > at org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) > at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) > at org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) > at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) > at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) > at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) > at java.lang.Thread.run(Thread.java:722) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)