Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A8AB4200B62 for ; Fri, 12 Aug 2016 20:20:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A7650160AB8; Fri, 12 Aug 2016 18:20:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ED602160A85 for ; Fri, 12 Aug 2016 20:20:21 +0200 (CEST) Received: (qmail 6802 invoked by uid 500); 12 Aug 2016 18:20:20 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 6785 invoked by uid 99); 12 Aug 2016 18:20:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Aug 2016 18:20:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 86CA92C02A6 for ; Fri, 12 Aug 2016 18:20:20 +0000 (UTC) Date: Fri, 12 Aug 2016 18:20:20 +0000 (UTC) From: "Eric Badger (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-10755) TestDecommissioningStatus BindException Failure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 12 Aug 2016 18:20:22 -0000 [ https://issues.apache.org/jira/browse/HDFS-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419248#comment-15419248 ] Eric Badger commented on HDFS-10755: ------------------------------------ The pre-commit test failure is unrelated to the patch. I believe the patch is ready for review. [~kihwal], could you take a look? > TestDecommissioningStatus BindException Failure > ----------------------------------------------- > > Key: HDFS-10755 > URL: https://issues.apache.org/jira/browse/HDFS-10755 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Eric Badger > Assignee: Eric Badger > Attachments: HDFS-10755.001.patch, HDFS-10755.002.patch > > > Tests in TestDecomissioningStatus call MiniDFSCluster.dataNodeRestart(). They are required to come back up on the same (initially ephemeral) port that they were on before being shutdown. Because of this, there is an inherent race condition where another process could bind to the port while the datanode is down. If this happens then we get a BindException failure. However, all of the tests in TestDecommissioningStatus depend on the cluster being up and running for them to run correctly. So if a test blows up the cluster, the subsequent tests will also fail. Below I show the BindException failure as well as the subsequent test failure that occurred. > {noformat} > java.net.BindException: Problem binding to [localhost:35370] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:436) > at sun.nio.ch.Net.bind(Net.java:428) > at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:430) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:768) > at org.apache.hadoop.ipc.Server.(Server.java:2391) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:951) > at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:498) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796) > at org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:802) > at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134) > at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:429) > at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2387) > at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2274) > at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2321) > at org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2037) > at org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionDeadDN(TestDecommissioningStatus.java:426) > {noformat} > {noformat} > java.lang.AssertionError: Number of Datanodes expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:275) > {noformat} > I don't think there's any way to avoid the inherent race condition with getting the same ephemeral port, but we can definitely fix the tests so that it doesn't cause subsequent tests to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org