hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-10336) TestBalancer failing intermittently because of not reseting UserGroupInformation completely
Date Tue, 28 Jun 2016 13:24:57 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Yiqun Lin updated HDFS-10336:
    Attachment: HDFS-10336.002.patch

Thanks [~rakeshr] for review. 
Increasing timeout is one approach, but am interested to know the reason behind 300000millis
timeout. Did you see any specific case for exceeding the current value?
I tested many times in my local, it seems good and runs quickly. I'm not so sure for the case
that exceeding the 30s now. Post the patch for your the comments.

> TestBalancer failing intermittently because of not reseting UserGroupInformation completely
> -------------------------------------------------------------------------------------------
>                 Key: HDFS-10336
>                 URL: https://issues.apache.org/jira/browse/HDFS-10336
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>         Attachments: HDFS-10336.001.patch, HDFS-10336.002.patch
> The unit test {{TestBalancer}} failed sometimes. 
> I looked for the reason. I found two main reasons causing this.
> * 1st. The test {{TestBalancer#testBalancerWithKeytabs}} executed timeout.
> {code}
> org.apache.hadoop.hdfs.server.balancer.TestBalancer
> testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  Time elapsed:
300.41 sec  <<< ERROR!
> java.lang.Exception: test timed out after 300000 milliseconds
> 	at java.lang.Thread.sleep(Native Method)
> 	at org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122)
> 	at org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096)
> 	at org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060)
> 	at org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635)
> 	at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689)
> 	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098)
> 	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125)
> {code}
> * 2nd. The test {{TestBalancer#testBalancerWithKeytabs}} reset the {{UGI}} not completely
sometimes in the finally block. And this affected the other unit tests threw {{IOException}},
like this:
> {code}
> testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
 Time elapsed: 0 sec  <<< ERROR!
> java.io.IOException: Running in secure mode, but config doesn't have a keytab
> 	at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300)
> {code}
> And there were not only one test will be affected by this. We should add a line to do
before doing reset {{UGI}} operation and can avoid the potenial exception happens.
> {code}
> UserGroupInformation.reset();
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message