Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 35420 invoked from network); 8 Jun 2009 18:00:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Jun 2009 18:00:21 -0000 Received: (qmail 91203 invoked by uid 500); 8 Jun 2009 18:00:32 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 91177 invoked by uid 500); 8 Jun 2009 18:00:32 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 91167 invoked by uid 99); 8 Jun 2009 18:00:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jun 2009 18:00:32 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jun 2009 18:00:29 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5703D234C046 for ; Mon, 8 Jun 2009 11:00:07 -0700 (PDT) Message-ID: <1953696251.1244484007347.JavaMail.jira@brutus> Date: Mon, 8 Jun 2009 11:00:07 -0700 (PDT) From: "Jonathan Gray (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-1314) master sees HRS znode expire and splits log while the HRS is still running and accepting edits In-Reply-To: <597626207.1239079812859.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717344#action_12717344 ] Jonathan Gray commented on HBASE-1314: -------------------------------------- Current status of this issue? Has not been discussed in a month, need to resolve or punt from 0.20.0 > master sees HRS znode expire and splits log while the HRS is still running and accepting edits > ---------------------------------------------------------------------------------------------- > > Key: HBASE-1314 > URL: https://issues.apache.org/jira/browse/HBASE-1314 > Project: Hadoop HBase > Issue Type: Bug > Affects Versions: 0.20.0 > Reporter: Andrew Purtell > Fix For: 0.20.0 > > > ZK session expiration related problem. HRS loses its ephemeral node while it is still up and running and accepting edits. Master sees it go away and starts splitting its logs while edits are still being written. After this, all reconstruction logs have to be manually removed from the region directories or the regions will never deploy (CRC errors). I think on HDFS edits would be lost, not corrupted. (I am using a HBase root on local file system.) > HRS ZK session expires, causing its znode to go away: > 2009-04-07 03:50:39,953 INFO org.apache.hadoop.hbase.master.ServerManager: localhost.localdomain_1239068648333_60020 znode expired > 2009-04-07 03:50:40,565 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessServerShutdown of localhost.localdomain_1239068648333_60020 > 2009-04-07 03:50:40,637 INFO org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of server localhost.localdomain_1239068648333_60020: logSplit: false, rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1 > But here we have the HRS still reporting in, triggering a LeaseStillHeldException: > 2009-04-07 03:50:40,826 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60000, call regionServerReport(address: 127.0.0.1:60020, startcode: 1239068648333, load: (requests=14, regions=7, usedHeap=582, maxHeap=888), [Lorg.apache.hadoop.hbase.HMsg;@6da21389, [Lorg.apache.hadoop.hbase.HRegionInfo;@2bb0bf9a) from 127.0.0.1:39238: error: org.apache.hadoop.hbase.Leases$LeaseStillHeldException > org.apache.hadoop.hbase.Leases$LeaseStillHeldException > at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:198) > at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:601) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:909) > And log splitting starts anyway: > 2009-04-07 03:50:41,139 INFO org.apache.hadoop.hbase.regionserver.HLog: Splitting 3 log(s) in file:/data/hbase/log_localhost.localdomain_1239068648333_60020 > 2009-04-07 03:50:41,139 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 1 of 3: file:/data/hbase/log_localhost.localdomain_1239068648333_60020/hlog.dat.1239075060711 > [...] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.