hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tian Hong Wang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4815) Double call countReplicas() to fetch corruptReplicas and liveReplicas is not needed
Date Tue, 14 May 2013 02:45:16 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tian Hong Wang updated HDFS-4815:
---------------------------------

    Description: 
In TestRBWBlockInvalidation, the original code is:
while (!isCorruptReported) {
        if (countReplicas(namesystem, blk).corruptReplicas() > 0) {
          isCorruptReported = true;
        }
        Thread.sleep(100);
}
assertEquals("There should be 1 replica in the corruptReplicasMap", 1,
          countReplicas(namesystem, blk).corruptReplicas());

Once the program detects there exists one corruptReplica, it will break the while loop. After
that, it call countReplicas() again in assertEquals(). But sometimes I met the following error:
java.lang.AssertionError: There should be 1 replica in the corruptReplicasMap expected:<1>
but was:<0>

It's obviously that second function call countReplicas() in assertEquals(), the corruptReplicas
value has been changed since program go to sleep and BlockManger recovered the corrupt block
during this sleep time.  

So what I do is:
1) once detecting there exists one corruptReplica, break the loop and don't call sleep(),
the same as liveReplicas
2) don't double check the countReplicas & liveReplicas in assertEquals()
3) sometimes I meet the problem of testcase timeout, so I speed up the block report interval


  was:
In TestRBWBlockInvalidation, the original code is:
while (!isCorruptReported) {
        if (countReplicas(namesystem, blk).corruptReplicas() > 0) {
          isCorruptReported = true;
        }
        Thread.sleep(100);
}
assertEquals("There should be 1 replica in the corruptReplicasMap", 1,
          countReplicas(namesystem, blk).corruptReplicas());

Once the program detects there exists one corruptReplica, it will break the while loop. After
that, it call countReplicas() again in assertEquals(). But sometimes I met the following error:
java.lang.AssertionError: There should be 1 replica in the corruptReplicasMap expected:<1>
but was:<0>

It's obviously that second function call countReplicas() in assertEquals(), the corruptReplicas
value has been changed since program go to sleep and BlockManger recovered the corrupt block
during this sleep time.  

So what I do is:
1) once detecting there exists one corruptReplica, break the loop and don't call sleep(),
the same as liveReplicas
2) don't double check the countReplicas & liveReplicas in assertEquals()
3) sometime I meet the problem of testcase timeout, so I speed up the block report interval


    
> Double call countReplicas() to fetch corruptReplicas and liveReplicas is not needed
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-4815
>                 URL: https://issues.apache.org/jira/browse/HDFS-4815
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Tian Hong Wang
>            Assignee: Tian Hong Wang
>              Labels: patch
>         Attachments: HDFS-4815.patch
>
>
> In TestRBWBlockInvalidation, the original code is:
> while (!isCorruptReported) {
>         if (countReplicas(namesystem, blk).corruptReplicas() > 0) {
>           isCorruptReported = true;
>         }
>         Thread.sleep(100);
> }
> assertEquals("There should be 1 replica in the corruptReplicasMap", 1,
>           countReplicas(namesystem, blk).corruptReplicas());
> Once the program detects there exists one corruptReplica, it will break the while loop.
After that, it call countReplicas() again in assertEquals(). But sometimes I met the following
error:
> java.lang.AssertionError: There should be 1 replica in the corruptReplicasMap expected:<1>
but was:<0>
> It's obviously that second function call countReplicas() in assertEquals(), the corruptReplicas
value has been changed since program go to sleep and BlockManger recovered the corrupt block
during this sleep time.  
> So what I do is:
> 1) once detecting there exists one corruptReplica, break the loop and don't call sleep(),
the same as liveReplicas
> 2) don't double check the countReplicas & liveReplicas in assertEquals()
> 3) sometimes I meet the problem of testcase timeout, so I speed up the block report interval

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message