Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 71640 invoked from network); 31 Jan 2009 18:42:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 Jan 2009 18:42:21 -0000 Received: (qmail 47152 invoked by uid 500); 31 Jan 2009 18:42:20 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 47143 invoked by uid 500); 31 Jan 2009 18:42:20 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 47132 invoked by uid 99); 31 Jan 2009 18:42:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Jan 2009 10:42:20 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 31 Jan 2009 18:42:20 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id AC494234C4AC for ; Sat, 31 Jan 2009 10:41:59 -0800 (PST) Message-ID: <303730879.1233427319704.JavaMail.jira@brutus> Date: Sat, 31 Jan 2009 10:41:59 -0800 (PST) From: "Jonathan Gray (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Created: (HBASE-1169) During shutdown, odd handling of regionserver hosting META MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org During shutdown, odd handling of regionserver hosting META ---------------------------------------------------------- Key: HBASE-1169 URL: https://issues.apache.org/jira/browse/HBASE-1169 Project: Hadoop HBase Issue Type: Bug Affects Versions: 0.19.0, 0.20.0 Reporter: Jonathan Gray Fix For: 0.19.1, 0.20.0 During shutdown of cluster, half way through quiescing servers there is a M= ETA scan in the master. The regions from servers whose leases are already = canceled show up as invalid. (72.34.249.208 is hosting META) {code} 2009-01-31 10:25:42,571 INFO org.apache.hadoop.hbase.master.HMaster: Cluste= r shutdown requested. Starting to quiesce servers 2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager: = Cancelling lease for 72.34.249.211:60020 2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager: = Region server 72.34.249.211:60020: MSG_REPORT_EXITING -- lease cancelled 2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager: = Cancelling lease for 72.34.249.216:60020 2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager: = Region server 72.34.249.216:60020: MSG_REPORT_EXITING -- lease cancelled 2009-01-31 10:25:47,840 INFO org.apache.hadoop.hbase.master.ServerManager: = Region server 72.34.249.210:60020 quiesced 2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager: = Cancelling lease for 72.34.249.215:60020 2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager: = Region server 72.34.249.215:60020: MSG_REPORT_EXITING -- lease cancelled 2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager: = Cancelling lease for 72.34.249.213:60020 2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager: = Region server 72.34.249.213:60020: MSG_REPORT_EXITING -- lease cancelled 2009-01-31 10:25:49,378 INFO org.apache.hadoop.hbase.master.ServerManager: = Region server 72.34.249.218:60020 quiesced 2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager: = Cancelling lease for 72.34.249.214:60020 2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager: = Region server 72.34.249.214:60020: MSG_REPORT_EXITING -- lease cancelled 2009-01-31 10:25:59,531 INFO org.apache.hadoop.hbase.master.BaseScanner: Re= gionManager.metaScanner scanning meta region {regionname: .META.,,1, startK= ey: <>, server: 72.34.249.218:60020} 2009-01-31 10:25:59,544 DEBUG org.apache.hadoop.hbase.master.BaseScanner: C= urrent assignment of activitydupehash,,1229364212541 is not valid; Server = '72.34.249.214:60020' unknown. 2009-01-31 10:25:59,545 DEBUG org.apache.hadoop.hbase.master.BaseScanner: C= urrent assignment of api,,1229364235220 is not valid; Server '72.34.249.21= 6:60020' unknown. 2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner: C= urrent assignment of apps,,1229364222879 is not valid; Server '72.34.249.2= 15:60020' unknown. 2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner: C= urrent assignment of assigners,,1229364037757 is not valid; Server '72.34.= 249.214:60020' unknown. 2009-01-31 10:25:59,554 DEBUG org.apache.hadoop.hbase.master.BaseScanner: C= urrent assignment of canoncache,,1229364041955 is not valid; Server '72.34= .249.215:60020' unknown. 2009-01-31 10:25:59,555 DEBUG org.apache.hadoop.hbase.master.BaseScanner: C= urrent assignment of chunks,,1229390225893 is not valid; Server '72.34.249= .211:60020' unknown. {/code} Shutdown then continues as the last servers are quiesced, but at the same t= ime the Master expires the lease on the regionserver that was hosting META = and that it just scanned. It then starts to replay the logs for that regio= nserver in the middle of the shutdown. {code} 2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: Re= gionManager.metaScanner scan of 512 row(s) of meta region {regionname: .MET= A.,,1, startKey: <>, server: 72.34.249.218:60020} complete 2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: Al= l 1 .META. region(s) scanned 2009-01-31 10:26:59,530 INFO org.apache.hadoop.hbase.master.BaseScanner: Re= gionManager.metaScanner scanning meta region {regionname: .META.,,1, startK= ey: <>, server: 72.34.249.218:60020} 2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: Re= gionManager.metaScanner scan of 512 row(s) of meta region {regionname: .MET= A.,,1, startKey: <>, server: 72.34.249.218:60020} complete 2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: Al= l 1 .META. region(s) scanned 2009-01-31 10:27:40,374 INFO org.apache.hadoop.hbase.master.ServerManager: = 72.34.249.218:60020 lease expired 2009-01-31 10:27:40,375 DEBUG org.apache.hadoop.hbase.master.HMaster: Proce= ssing todo: ProcessServerShutdown of 72.34.249.218:60020 2009-01-31 10:27:40,375 INFO org.apache.hadoop.hbase.master.RegionServerOpe= ration: process shutdown of server 72.34.249.218:60020: logSplit: false, ro= otRescanned: false, numberOfMetaRegions: 1, onlin eMetaRegions.size(): 1 2009-01-31 10:27:40,387 INFO org.apache.hadoop.hbase.regionserver.HLog: Spl= itting 44 log(s) in hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_6= 0020 2009-01-31 10:27:40,387 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Sp= litting 1 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_6002= 0/hlog.dat.1232996040603 2009-01-31 10:27:40,443 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Cr= eating new log file writer for path hdfs://mb0:9000/hbase/.META./1028785192= /oldlogfile.log and region .META.,,1 2009-01-31 10:27:40,575 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Cr= eating new log file writer for path hdfs://mb0:9000/hbase/sources/671225115= /oldlogfile.log and region sources,,1229364117966 2009-01-31 10:27:41,171 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Ap= plied 100003 total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_12329= 96040351_60020/hlog.dat.1232996040603 2009-01-31 10:27:41,173 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Sp= litting 2 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_6002= 0/hlog.dat.1233093726382 2009-01-31 10:27:41,429 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Cr= eating new log file writer for path hdfs://mb0:9000/hbase/dupehash/16075325= 82/oldlogfile.log and region dupehash,O=EF=BF=BD=EF=BF=BD(SequenceFile.jav= a:1417) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.jav= a:1412) at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:742= ) at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:705= ) at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(Pro= cessServerShutdown.java:249) at org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.= java:427) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:360) 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager:= telling root scanner to stop 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager:= telling meta scanner to stop 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager:= meta and root scanners notified 2009-01-31 10:27:44,023 DEBUG org.apache.hadoop.hbase.RegionHistorian: Offl= ined 2009-01-31 10:27:44,023 INFO org.apache.hadoop.hbase.master.HMaster: Stoppi= ng infoServer 2009-01-31 10:27:44,023 INFO org.mortbay.util.ThreadedServer: Stopping Acce= ptor ServerSocket[addr=3D0.0.0.0/0.0.0.0,port=3D0,localport=3D60010] 2009-01-31 10:27:44,026 INFO org.mortbay.http.SocketListener: Stopped Socke= tListener on 0.0.0.0:60010 {/code} --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.