Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 06968200CA3 for ; Thu, 1 Jun 2017 16:20:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0532C160BB5; Thu, 1 Jun 2017 14:20:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4CC3A160BC4 for ; Thu, 1 Jun 2017 16:20:09 +0200 (CEST) Received: (qmail 54189 invoked by uid 500); 1 Jun 2017 14:20:08 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 54178 invoked by uid 99); 1 Jun 2017 14:20:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Jun 2017 14:20:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id B58DFC1471 for ; Thu, 1 Jun 2017 14:20:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id p3yCZHizY3R1 for ; Thu, 1 Jun 2017 14:20:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id A857A5FAFA for ; Thu, 1 Jun 2017 14:20:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 1BC68E0D99 for ; Thu, 1 Jun 2017 14:20:06 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id C6BCA21B5B for ; Thu, 1 Jun 2017 14:20:04 +0000 (UTC) Date: Thu, 1 Jun 2017 14:20:04 +0000 (UTC) From: "Kihwal Lee (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 01 Jun 2017 14:20:10 -0000 [ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033048#comment-16033048 ] Kihwal Lee commented on HDFS-10816: ----------------------------------- +1 > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor > ----------------------------------------------------------------------------------------------------------- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Eric Badger > Assignee: Eric Badger > Attachments: HDFS-10816.001.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the replication monitor. The default replication monitor interval is 3 seconds, which is just about how long the test normally takes to run. The test deletes a file and then subsequently gets the namesystem writelock. However, if the replication monitor fires in between those two instructions, the test will fail as it will itself invalidate one of the blocks. This can be easily reproduced by removing the sleep() in the ReplicationMonitor's run() method in BlockManager.java, so that the replication monitor executes as quickly as possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication monitor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org