Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 684D5200BFF for ; Tue, 3 Jan 2017 04:13:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5C21B160B42; Tue, 3 Jan 2017 03:13:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A69F0160B22 for ; Tue, 3 Jan 2017 04:12:59 +0100 (CET) Received: (qmail 28218 invoked by uid 500); 3 Jan 2017 03:12:58 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 28168 invoked by uid 99); 3 Jan 2017 03:12:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jan 2017 03:12:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 82C7C2C1F56 for ; Tue, 3 Jan 2017 03:12:58 +0000 (UTC) Date: Tue, 3 Jan 2017 03:12:58 +0000 (UTC) From: "Rakesh R (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11284) [SPS]: Avoid running SPS under safemode and fix issues in target node choosing. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 03 Jan 2017 03:13:00 -0000 [ https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793992#comment-15793992 ] Rakesh R commented on HDFS-11284: --------------------------------- bq. The #3 still exists. Say we have these datanodes in our cluster: Thanks [~yuanbo] for the detailed analysis. In your example, it is reducing the replication factor from 4 to 3. IIUC, {{ReplicaNotFoundException}} is occurred for the extra replica block and that is expected due to block deletion. It would be great if you could explore the impact of this exception and retries. Also, appreciate adding/contribute a unit test case to show the behavior. Thanks! If it is a case of under replicated blocks then coordinator datanode will hit exception while movement and send this error result to SPS. Later SPS, will schedule for retries, right? > [SPS]: Avoid running SPS under safemode and fix issues in target node choosing. > ------------------------------------------------------------------------------- > > Key: HDFS-11284 > URL: https://issues.apache.org/jira/browse/HDFS-11284 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode > Reporter: Yuanbo Liu > Assignee: Yuanbo Liu > > Recently I've found in some conditions, SPS is not stable: > * SPS runs under safe mode. > * There're some overlap nodes in the chosen target nodes. > * The real replication number of block doesn't match the replication factor. For example, the real replication is 2 while the replication factor is 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org