Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7FF81200C24 for ; Thu, 9 Feb 2017 05:05:47 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7E9B2160B67; Thu, 9 Feb 2017 04:05:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CAD9A160B49 for ; Thu, 9 Feb 2017 05:05:46 +0100 (CET) Received: (qmail 975 invoked by uid 500); 9 Feb 2017 04:05:45 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 963 invoked by uid 99); 9 Feb 2017 04:05:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Feb 2017 04:05:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E5B19189FB6 for ; Thu, 9 Feb 2017 04:05:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.199 X-Spam-Level: X-Spam-Status: No, score=-1.199 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Izyifid8D_fj for ; Thu, 9 Feb 2017 04:05:43 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 107BF5F473 for ; Thu, 9 Feb 2017 04:05:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 2CAB1E04B5 for ; Thu, 9 Feb 2017 04:05:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id AEE9621D66 for ; Thu, 9 Feb 2017 04:05:41 +0000 (UTC) Date: Thu, 9 Feb 2017 04:05:41 +0000 (UTC) From: "Yiqun Lin (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-11398) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure still fails intermittently MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 09 Feb 2017 04:05:47 -0000 [ https://issues.apache.org/jira/browse/HDFS-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-11398: ----------------------------- Attachment: HDFS-11398.002.patch Attach a new patch to user mockito to make blocks not to be replicated. > TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure still fails intermittently > ---------------------------------------------------------------------------------------- > > Key: HDFS-11398 > URL: https://issues.apache.org/jira/browse/HDFS-11398 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.0.0-alpha2 > Reporter: Yiqun Lin > Assignee: Yiqun Lin > Attachments: failure.log, HDFS-11398.001.patch, HDFS-11398.002.patch, HDFS-11398-reproduce.patch > > > The test {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} still fails intermittently in trunk after HDFS-11316. The stack infos: > {code} > testUnderReplicationAfterVolFailure(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure) Time elapsed: 95.021 sec <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for condition. Thread diagnostics: > Timestamp: 2017-02-07 07:00:34,193 > .... > java.lang.Thread.State: RUNNABLE > at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method) > at org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52) > at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:511) > at java.lang.Thread.run(Thread.java:745) > at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:276) > at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure(TestDataNodeVolumeFailure.java:412) > {code} > I looked into this and found there is one chance that the vaule {{UnderReplicatedBlocksCount}} will be no longer > 0. The following is my analysation: > In test {{TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure}}, it uses creating file to trigger the disk error checking. The related codes: > {code} > Path file1 = new Path("/test1"); > DFSTestUtil.createFile(fs, file1, 1024, (short)3, 1L); > DFSTestUtil.waitReplication(fs, file1, (short)3); > // Fail the first volume on both datanodes > File dn1Vol1 = new File(dataDir, "data"+(2*0+1)); > File dn2Vol1 = new File(dataDir, "data"+(2*1+1)); > DataNodeTestUtils.injectDataDirFailure(dn1Vol1, dn2Vol1); > Path file2 = new Path("/test2"); > DFSTestUtil.createFile(fs, file2, 1024, (short)3, 1L); > DFSTestUtil.waitReplication(fs, file2, (short)3); > {code} > This will lead one problem: If the cluster is busy, and it costs long time to wait replication of file2 to be desired value. During this time, the under replication blocks of file1 can also be rereplication in cluster. If this is done, the condition {{underReplicatedBlocks > 0}} will never be satisfied. > And this can be reproduced in my local env. > Actually, we can use a easy way {{DataNodeTestUtils.waitForDiskError}} to replace this, it runs fast and be more reliable. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org