Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (unknown [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0B3499A1E for ; Tue, 14 May 2013 04:31:57 +0000 (UTC) Received: (qmail 66061 invoked by uid 500); 14 May 2013 02:45:16 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 65995 invoked by uid 500); 14 May 2013 02:45:16 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 65985 invoked by uid 99); 14 May 2013 02:45:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 May 2013 02:45:16 +0000 Date: Tue, 14 May 2013 02:45:16 +0000 (UTC) From: "Tian Hong Wang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-4815) Double call countReplicas() to fetch corruptReplicas and liveReplicas is not needed MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Hong Wang updated HDFS-4815: --------------------------------- Description: In TestRBWBlockInvalidation, the original code is: while (!isCorruptReported) { if (countReplicas(namesystem, blk).corruptReplicas() > 0) { isCorruptReported = true; } Thread.sleep(100); } assertEquals("There should be 1 replica in the corruptReplicasMap", 1, countReplicas(namesystem, blk).corruptReplicas()); Once the program detects there exists one corruptReplica, it will break the while loop. After that, it call countReplicas() again in assertEquals(). But sometimes I met the following error: java.lang.AssertionError: There should be 1 replica in the corruptReplicasMap expected:<1> but was:<0> It's obviously that second function call countReplicas() in assertEquals(), the corruptReplicas value has been changed since program go to sleep and BlockManger recovered the corrupt block during this sleep time. So what I do is: 1) once detecting there exists one corruptReplica, break the loop and don't call sleep(), the same as liveReplicas 2) don't double check the countReplicas & liveReplicas in assertEquals() 3) sometimes I meet the problem of testcase timeout, so I speed up the block report interval was: In TestRBWBlockInvalidation, the original code is: while (!isCorruptReported) { if (countReplicas(namesystem, blk).corruptReplicas() > 0) { isCorruptReported = true; } Thread.sleep(100); } assertEquals("There should be 1 replica in the corruptReplicasMap", 1, countReplicas(namesystem, blk).corruptReplicas()); Once the program detects there exists one corruptReplica, it will break the while loop. After that, it call countReplicas() again in assertEquals(). But sometimes I met the following error: java.lang.AssertionError: There should be 1 replica in the corruptReplicasMap expected:<1> but was:<0> It's obviously that second function call countReplicas() in assertEquals(), the corruptReplicas value has been changed since program go to sleep and BlockManger recovered the corrupt block during this sleep time. So what I do is: 1) once detecting there exists one corruptReplica, break the loop and don't call sleep(), the same as liveReplicas 2) don't double check the countReplicas & liveReplicas in assertEquals() 3) sometime I meet the problem of testcase timeout, so I speed up the block report interval > Double call countReplicas() to fetch corruptReplicas and liveReplicas is not needed > ----------------------------------------------------------------------------------- > > Key: HDFS-4815 > URL: https://issues.apache.org/jira/browse/HDFS-4815 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Tian Hong Wang > Assignee: Tian Hong Wang > Labels: patch > Attachments: HDFS-4815.patch > > > In TestRBWBlockInvalidation, the original code is: > while (!isCorruptReported) { > if (countReplicas(namesystem, blk).corruptReplicas() > 0) { > isCorruptReported = true; > } > Thread.sleep(100); > } > assertEquals("There should be 1 replica in the corruptReplicasMap", 1, > countReplicas(namesystem, blk).corruptReplicas()); > Once the program detects there exists one corruptReplica, it will break the while loop. After that, it call countReplicas() again in assertEquals(). But sometimes I met the following error: > java.lang.AssertionError: There should be 1 replica in the corruptReplicasMap expected:<1> but was:<0> > It's obviously that second function call countReplicas() in assertEquals(), the corruptReplicas value has been changed since program go to sleep and BlockManger recovered the corrupt block during this sleep time. > So what I do is: > 1) once detecting there exists one corruptReplica, break the loop and don't call sleep(), the same as liveReplicas > 2) don't double check the countReplicas & liveReplicas in assertEquals() > 3) sometimes I meet the problem of testcase timeout, so I speed up the block report interval -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira