Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7C780200B2B for ; Tue, 28 Jun 2016 15:27:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 7B133160A6E; Tue, 28 Jun 2016 13:27:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CCF05160A56 for ; Tue, 28 Jun 2016 15:27:09 +0200 (CEST) Received: (qmail 72987 invoked by uid 500); 28 Jun 2016 13:27:00 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 72909 invoked by uid 99); 28 Jun 2016 13:26:57 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2016 13:26:57 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 5A1DC2C1F5C for ; Tue, 28 Jun 2016 13:26:57 +0000 (UTC) Date: Tue, 28 Jun 2016 13:26:57 +0000 (UTC) From: "Yiqun Lin (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HDFS-10336) TestBalancer failing intermittently because of not reseting UserGroupInformation completely MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 28 Jun 2016 13:27:10 -0000 [ https://issues.apache.org/jira/browse/HDFS-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352976#comment-15352976 ] Yiqun Lin edited comment on HDFS-10336 at 6/28/16 1:25 PM: ----------------------------------------------------------- Thanks [~rakeshr] for review. {quote} Increasing timeout is one approach, but am interested to know the reason behind 300000millis timeout. Did you see any specific case for exceeding the current value? {quote} I tested many times in my local, it seems good and runs quickly. I'm not so sure for the case that exceeding the 30s now. Post the patch for addressing your comments. was (Author: linyiqun): Thanks [~rakeshr] for review. {quote} Increasing timeout is one approach, but am interested to know the reason behind 300000millis timeout. Did you see any specific case for exceeding the current value? {quote} I tested many times in my local, it seems good and runs quickly. I'm not so sure for the case that exceeding the 30s now. Post the patch for your the comments. > TestBalancer failing intermittently because of not reseting UserGroupInformation completely > ------------------------------------------------------------------------------------------- > > Key: HDFS-10336 > URL: https://issues.apache.org/jira/browse/HDFS-10336 > Project: Hadoop HDFS > Issue Type: Bug > Components: test > Affects Versions: 3.0.0-alpha1 > Reporter: Yiqun Lin > Assignee: Yiqun Lin > Attachments: HDFS-10336.001.patch, HDFS-10336.002.patch > > > The unit test {{TestBalancer}} failed sometimes. > I looked for the reason. I found two main reasons causing this. > * 1st. The test {{TestBalancer#testBalancerWithKeytabs}} executed timeout. > {code} > org.apache.hadoop.hdfs.server.balancer.TestBalancer > testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer) Time elapsed: 300.41 sec <<< ERROR! > java.lang.Exception: test timed out after 300000 milliseconds > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122) > at org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096) > at org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060) > at org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635) > at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689) > at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098) > at org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125) > {code} > * 2nd. The test {{TestBalancer#testBalancerWithKeytabs}} reset the {{UGI}} not completely sometimes in the finally block. And this affected the other unit tests threw {{IOException}}, like this: > {code} > testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer) Time elapsed: 0 sec <<< ERROR! > java.io.IOException: Running in secure mode, but config doesn't have a keytab > at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300) > {code} > And there were not only one test will be affected by this. We should add a line to do before doing reset {{UGI}} operation and can avoid the potenial exception happens. > {code} > UserGroupInformation.reset(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org