Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 3 May 2017 22:10:04 +0000 (UTC)
From: "Manoj Govindassamy (JIRA)" <jira@apache.org>
To: hdfs-dev@hadoop.apache.org
Message-ID: <JIRA.13068959.1493849347000.120571.1493849404490@Atlassian.JIRA>
In-Reply-To: <JIRA.13068959.1493849347000@Atlassian.JIRA>
References: <JIRA.13068959.1493849347000@Atlassian.JIRA> <JIRA.13068959.1493849347740@jira-lw-us.apache.org>
Subject: [jira] [Created] (HDFS-11749) Ongoing file write fails when its
 pipeline DataNode is pulled out for maintenance
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 03 May 2017 22:10:09 -0000

Manoj Govindassamy created HDFS-11749:
-----------------------------------------

             Summary: Ongoing file write fails when its pipeline DataNode is pulled out for maintenance
                 Key: HDFS-11749
                 URL: https://issues.apache.org/jira/browse/HDFS-11749
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs
    Affects Versions: 3.0.0-alpha1
            Reporter: Manoj Govindassamy
            Assignee: Manoj Govindassamy


HDFS Maintenance State HDFS-7877 is suppose to put DataNodes first to ENTERING_MAINTENANCE state and when all blocks are sufficiently replicated, DNs transition to IN_MAINTENANCE state. Also, the UNDER_CONSTRUCTION files and any ongoing writes to these files should not fail by the maintenance state transition. But, in few runs I have seen the ongoing writes to open files fail as its pipeline DNs are pulled out via Maintenance State feature. Test case is attached.

{code}
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:49306,DS-eeca7153-fba2-4f2e-a044-0a292fc6dc6d,DISK], DatanodeInfoWithStorage[127.0.0.1:49302,DS-a5adf33c-81d0-413b-879c-9c4d9acbb72a,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:49306,DS-eeca7153-fba2-4f2e-a044-0a292fc6dc6d,DISK], DatanodeInfoWithStorage[127.0.0.1:49302,DS-a5adf33c-81d0-413b-879c-9c4d9acbb72a,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.

	at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1299)
	at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1365)
	at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1545)
	at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1460)
	at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1443)
	at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1251)
	at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:668)
{code}


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org