Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1CBBB200D0E for ; Tue, 12 Sep 2017 03:29:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 19A311609C6; Tue, 12 Sep 2017 01:29:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5F3921609C4 for ; Tue, 12 Sep 2017 03:29:06 +0200 (CEST) Received: (qmail 42342 invoked by uid 500); 12 Sep 2017 01:29:04 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 42331 invoked by uid 99); 12 Sep 2017 01:29:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Sep 2017 01:29:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 1C6AE1A6ABE for ; Tue, 12 Sep 2017 01:29:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id SNLg2IYXIjOV for ; Tue, 12 Sep 2017 01:29:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 5F2CA5FC12 for ; Tue, 12 Sep 2017 01:29:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7BD31E0E3D for ; Tue, 12 Sep 2017 01:29:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id D2E6A24157 for ; Tue, 12 Sep 2017 01:29:00 +0000 (UTC) Date: Tue, 12 Sep 2017 01:29:00 +0000 (UTC) From: "Weiwei Yang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 12 Sep 2017 01:29:07 -0000 [ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162330#comment-16162330 ] Weiwei Yang commented on HDFS-12098: ------------------------------------ Hi [~anu], [~vagarychen] Thanks for revisiting this, I could not reproduce this either on latest code base, looks like this was fixed by some other patches. This seems no longer a valid issue, I think we can close it. Thanks for spending time trying to reproduce this. > Ozone: Datanode is unable to register with scm if scm starts later > ------------------------------------------------------------------ > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm > Affects Versions: HDFS-7240 > Reporter: Weiwei Yang > Assignee: Weiwei Yang > Priority: Critical > Labels: ozoneMerge > Fix For: HDFS-7240 > > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase-1.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log > > > Reproducing steps > 1. Start namenode > {{./bin/hdfs --daemon start namenode}} > 2. Start datanode > {{./bin/hdfs datanode}} > will see following connection issues > {noformat} > 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) > 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) > 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) > 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) > {noformat} > this is expected because scm is not started yet > 3. Start scm > {{./bin/hdfs scm}} > expecting datanode can register to this scm, expecting the log in scm > {noformat} > 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: af22862d-aafa-4941-9073-53224ae43e2c Registered. > {noformat} > but did *NOT* see this log. (_I debugged into the code and found the datanode state was transited SHUTDOWN unexpectedly because the thread leaks, each of those threads counted to set to next state and they all set to SHUTDOWN state_) > 4. Create a container from scm CLI > {{./bin/hdfs scm -container -create -c 20170714c0}} > this fails with following exception > {noformat} > Creating container : 20170714c0. > Error executing command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException): Unable to create container while in chill mode > at org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241) > at org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392) > at org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73) > {noformat} > datanode was not registered to scm, thus it's still in chill mode. > *Note*, if we start scm first, there is no such issue, I can create container from CLI without any problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org