Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E9D2FB24 for ; Tue, 16 Apr 2013 19:11:18 +0000 (UTC) Received: (qmail 89088 invoked by uid 500); 16 Apr 2013 19:11:17 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 89042 invoked by uid 500); 16 Apr 2013 19:11:17 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 88989 invoked by uid 99); 16 Apr 2013 19:11:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Apr 2013 19:11:17 +0000 Date: Tue, 16 Apr 2013 19:11:17 +0000 (UTC) From: "Jonathan Hsieh (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-8349) TestLogRolling#TestLogRollOnDatanodeDeath hangs under hadoop2 profile MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-8349: ---------------------------------- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > TestLogRolling#TestLogRollOnDatanodeDeath hangs under hadoop2 profile > --------------------------------------------------------------------- > > Key: HBASE-8349 > URL: https://issues.apache.org/jira/browse/HBASE-8349 > Project: HBase > Issue Type: Sub-task > Components: hadoop2 > Affects Versions: 0.98.0, 0.95.0 > Reporter: Jonathan Hsieh > Assignee: Jonathan Hsieh > Fix For: 0.98.0, 0.95.1 > > Attachments: hbase-8349.patch > > > TestLogRolling has been hanging -- after a data node is killed the client attempts to recover a lease and fails forever. (This example ran for a while and shows recovery attempt 541888). > {code} > 2013-04-15 16:37:49,074 INFO [SplitLogWorker-localhost,39898,1366065830907] util.FSHDFSUtils(72): Attempt 541888 to recoverLease on file hdfs://localhost:41333/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta returned false, trying for 2642865ms > 2013-04-15 16:37:49,075 ERROR [SplitLogWorker-localhost,39898,1366065830907] util.FSHDFSUtils(86): Can't recoverLease after 541888 attempts and 2642866ms for hdfs://localhost:41333/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta - continuing without the lease, but we could have a data loss. > 2013-04-15 16:37:49,075 INFO [IPC Server handler 9 on 41333] namenode.FSNamesystem(1957): recoverLease: recover lease [Lease. Holder: DFSClient_hb_rs_localhost,39898,1366065830907_1890639591_1091, pendingcreates: 1], src=/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta from client DFSClient_hb_rs_localhost,39898,1366065830907_1890639591_1091 > 2013-04-15 16:37:49,075 INFO [IPC Server handler 9 on 41333] namenode.FSNamesystem(2981): Recovering lease=[Lease. Holder: DFSClient_hb_rs_localhost,39898,1366065830907_1890639591_1091, pendingcreates: 1], src=/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta > 2013-04-15 16:37:49,078 WARN [IPC Server handler 9 on 41333] namenode.FSNamesystem(3096): DIR* NameSystem.internalReleaseLease: File /user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta has not been closed. Lease recovery is in progress. RecoveryId = 543317 for block blk_7636447875270454121_1019{blockUCState=UNDER_RECOVERY, primaryNodeIndex=1, replicas=[ReplicaUnderConstruction[127.0.0.1:38288|RBW], ReplicaUnderConstruction[127.0.0.1:35956|RWR]]} > 2013-04-15 16:37:49,078 INFO [SplitLogWorker-localhost,39898,1366065830907] util.FSHDFSUtils(72): Attempt 541889 to recoverLease on file hdfs://localhost:41333/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta returned false, trying for 2642869ms > 2013-04-15 16:37:49,079 ERROR [SplitLogWorker-localhost,39898,1366065830907] util.FSHDFSUtils(86): Can't recoverLease after 541889 attempts and 2642870ms for hdfs://localhost:41333/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta - continuing without the lease, but we could have a data loss. > 2013-04-15 16:37:49,079 INFO [IPC Server handler 4 on 41333] namenode.FSNamesystem(1957): recoverLease: recover lease [Lease. Holder: DFSClient_hb_rs_localhost,39898,1366065830907_1890639591_1091, pendingcreates: 1], src=/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta from client DFSClient_hb_rs_localhost,39898,1366065830907_1890639591_1091 > 2013-04-15 16:37:49,079 INFO [IPC Server handler 4 on 41333] namenode.FSNamesystem(2981): Recovering lease=[Lease. Holder: DFSClient_hb_rs_localhost,39898,1366065830907_1890639591_1091, pendingcreates: 1], src=/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta > 2013-04-15 16:37:49,082 WARN [IPC Server handler 4 on 41333] namenode.FSNamesystem(3096): DIR* NameSystem.internalReleaseLease: File /user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta has not been closed. Lease recovery is in progress. RecoveryId = 543318 for block blk_7636447875270454121_1019{blockUCState=UNDER_RECOVERY, primaryNodeIndex=1, replicas=[ReplicaUnderConstruction[127.0.0.1:38288|RBW], ReplicaUnderConstruction[127.0.0.1:35956|RWR]]} > 2013-04-15 16:37:49,083 INFO [SplitLogWorker-localhost,39898,1366065830907] util.FSHDFSUtils(72): Attempt 541890 to recoverLease on file hdfs://localhost:41333/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta returned false, trying for 2642874ms > 2013-04-15 16:37:49,083 ERROR [SplitLogWorker-localhost,39898,1366065830907] util.FSHDFSUtils(86): Can't recoverLease after 541890 attempts and 2642874ms for hdfs://localhost:41333/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta - continuing without the lease, but we could have a data loss. > {code} > It initially starts with permissions errors similar to HBASE-7636. For now we disable in this test and will address with a fix in HBASE-8337 with an assist from HDFS folks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira