Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5C893200C1D for ; Thu, 2 Feb 2017 02:37:55 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5B028160B5E; Thu, 2 Feb 2017 01:37:55 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A4CEC160B46 for ; Thu, 2 Feb 2017 02:37:54 +0100 (CET) Received: (qmail 94533 invoked by uid 500); 2 Feb 2017 01:37:53 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 94522 invoked by uid 99); 2 Feb 2017 01:37:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Feb 2017 01:37:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 3EE40C111D for ; Thu, 2 Feb 2017 01:37:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.199 X-Spam-Level: X-Spam-Status: No, score=-1.199 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id h08NLKN8MBGC for ; Thu, 2 Feb 2017 01:37:52 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 575B85F177 for ; Thu, 2 Feb 2017 01:37:52 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C944CE039D for ; Thu, 2 Feb 2017 01:37:51 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 86D552528C for ; Thu, 2 Feb 2017 01:37:51 +0000 (UTC) Date: Thu, 2 Feb 2017 01:37:51 +0000 (UTC) From: "Yiqun Lin (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-11353) Improve the unit tests relevant to DataNode volume failure testing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 02 Feb 2017 01:37:55 -0000 [ https://issues.apache.org/jira/browse/HDFS-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-11353: ----------------------------- Attachment: HDFS-11353.006.patch > Improve the unit tests relevant to DataNode volume failure testing > ------------------------------------------------------------------ > > Key: HDFS-11353 > URL: https://issues.apache.org/jira/browse/HDFS-11353 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 3.0.0-alpha2 > Reporter: Yiqun Lin > Assignee: Yiqun Lin > Attachments: HDFS-11353.001.patch, HDFS-11353.002.patch, HDFS-11353.003.patch, HDFS-11353.004.patch, HDFS-11353.005.patch, HDFS-11353.006.patch > > > Currently there are many tests which start with {{TestDataNodeVolumeFailure*}} frequently run timedout or failed. I found one failure test in recent Jenkins building. The stack info: > {code} > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures > java.util.concurrent.TimeoutException: Timed out waiting for DN to die > at org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702) > at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208) > {code} > The related codes: > {code} > /* > * Now fail the 2nd volume on the 3rd datanode. All its volumes > * are now failed and so it should report two volume failures > * and that it's no longer up. Only wait for two replicas since > * we'll never get a third. > */ > DataNodeTestUtils.injectDataDirFailure(dn3Vol2); > Path file3 = new Path("/test3"); > DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L); > DFSTestUtil.waitReplication(fs, file3, (short)2); > // The DN should consider itself dead > DFSTestUtil.waitForDatanodeDeath(dns.get(2)); > {code} > Here the code waits for the datanode failed all the volume and then become dead. But it timed out. We would be better to compare that if all the volumes are failed then wair for the datanode dead. > In addition, we can use the method {{checkDiskErrorSync}} to do the disk error check instead of creaing files. In this JIRA, I would like to extract this logic and defined that in {{DataNodeTestUtils}}. And then we can reuse this method for datanode volme failure testing in the future. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org