Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 3 Jan 2017 03:12:58 +0000 (UTC)
From: "Rakesh R (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13031241.1483087473000.625736.1483413178532@Atlassian.JIRA>
In-Reply-To: <JIRA.13031241.1483087473000@Atlassian.JIRA>
References: <JIRA.13031241.1483087473000@Atlassian.JIRA> <JIRA.13031241.1483087473599@arcas>
Subject: [jira] [Commented] (HDFS-11284) [SPS]: Avoid running SPS under
 safemode and fix issues in target node choosing.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 03 Jan 2017 03:13:00 -0000


    [ https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793992#comment-15793992 ] 

Rakesh R commented on HDFS-11284:
---------------------------------

bq. The #3 still exists. Say we have these datanodes in our cluster:
Thanks [~yuanbo] for the detailed analysis. In your example, it is reducing the replication factor from 4 to 3. IIUC, {{ReplicaNotFoundException}} is occurred for the extra replica block and that is expected due to block deletion. It would be great if you could explore the impact of this exception and retries. Also, appreciate adding/contribute a unit test case to show the behavior. Thanks!

If it is a case of under replicated blocks then coordinator datanode will hit exception while movement and send this error result to SPS. Later SPS, will schedule for retries, right?

> [SPS]: Avoid running SPS under safemode and fix issues in target node choosing.
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-11284
>                 URL: https://issues.apache.org/jira/browse/HDFS-11284
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Yuanbo Liu
>            Assignee: Yuanbo Liu
>
> Recently I've found in some conditions, SPS is not stable:
> * SPS runs under safe mode.
> * There're some overlap nodes in the chosen target nodes.
> * The real replication number of block doesn't match the replication factor. For example, the real replication is 2 while the replication factor is 3.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org