Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 11859 invoked from network); 5 Mar 2009 19:56:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Mar 2009 19:56:17 -0000 Received: (qmail 42303 invoked by uid 500); 5 Mar 2009 19:56:17 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 42281 invoked by uid 500); 5 Mar 2009 19:56:17 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 42270 invoked by uid 99); 5 Mar 2009 19:56:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Mar 2009 11:56:17 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Mar 2009 19:56:16 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 2F643234C4AF for ; Thu, 5 Mar 2009 11:55:56 -0800 (PST) Message-ID: <1079218825.1236282956193.JavaMail.jira@brutus> Date: Thu, 5 Mar 2009 11:55:56 -0800 (PST) From: "stack (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-1169) During shutdown, odd handling of regionserver hosting META In-Reply-To: <303730879.1233427319704.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1169?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1267= 9325#action_12679325 ]=20 stack commented on HBASE-1169: ------------------------------ Should we move to 0.19.2 and 0.20.0 and out of 0.19.1? > During shutdown, odd handling of regionserver hosting META > ---------------------------------------------------------- > > Key: HBASE-1169 > URL: https://issues.apache.org/jira/browse/HBASE-1169 > Project: Hadoop HBase > Issue Type: Bug > Affects Versions: 0.19.0, 0.20.0 > Reporter: Jonathan Gray > Priority: Critical > Fix For: 0.19.1, 0.20.0 > > > During shutdown of cluster, half way through quiescing servers there is a= META scan in the master. The regions from servers whose leases are alread= y canceled show up as invalid. (72.34.249.208 is hosting META) > {code} > 2009-01-31 10:25:42,571 INFO org.apache.hadoop.hbase.master.HMaster: Clus= ter shutdown requested. Starting to quiesce servers > 2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager= : Cancelling lease for 72.34.249.211:60020 > 2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager= : Region server 72.34.249.211:60020: MSG_REPORT_EXITING -- lease cancelled > 2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager= : Cancelling lease for 72.34.249.216:60020 > 2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager= : Region server 72.34.249.216:60020: MSG_REPORT_EXITING -- lease cancelled > 2009-01-31 10:25:47,840 INFO org.apache.hadoop.hbase.master.ServerManager= : Region server 72.34.249.210:60020 quiesced > 2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager= : Cancelling lease for 72.34.249.215:60020 > 2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager= : Region server 72.34.249.215:60020: MSG_REPORT_EXITING -- lease cancelled > 2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager= : Cancelling lease for 72.34.249.213:60020 > 2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager= : Region server 72.34.249.213:60020: MSG_REPORT_EXITING -- lease cancelled > 2009-01-31 10:25:49,378 INFO org.apache.hadoop.hbase.master.ServerManager= : Region server 72.34.249.218:60020 quiesced > 2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager= : Cancelling lease for 72.34.249.214:60020 > 2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager= : Region server 72.34.249.214:60020: MSG_REPORT_EXITING -- lease cancelled > 2009-01-31 10:25:59,531 INFO org.apache.hadoop.hbase.master.BaseScanner: = RegionManager.metaScanner scanning meta region {regionname: .META.,,1, star= tKey: <>, server: 72.34.249.218:60020} > 2009-01-31 10:25:59,544 DEBUG org.apache.hadoop.hbase.master.BaseScanner:= Current assignment of activitydupehash,,1229364212541 is not valid; Serve= r '72.34.249.214:60020' unknown. > 2009-01-31 10:25:59,545 DEBUG org.apache.hadoop.hbase.master.BaseScanner:= Current assignment of api,,1229364235220 is not valid; Server '72.34.249.= 216:60020' unknown. > 2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner:= Current assignment of apps,,1229364222879 is not valid; Server '72.34.249= .215:60020' unknown. > 2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner:= Current assignment of assigners,,1229364037757 is not valid; Server '72.3= 4.249.214:60020' unknown. > 2009-01-31 10:25:59,554 DEBUG org.apache.hadoop.hbase.master.BaseScanner:= Current assignment of canoncache,,1229364041955 is not valid; Server '72.= 34.249.215:60020' unknown. > 2009-01-31 10:25:59,555 DEBUG org.apache.hadoop.hbase.master.BaseScanner:= Current assignment of chunks,,1229390225893 is not valid; Server '72.34.2= 49.211:60020' unknown. > {code} > Shutdown then continues as the last servers are quiesced, but at the same= time the Master expires the lease on the regionserver that was hosting MET= A and that it just scanned. It then starts to replay the logs for that reg= ionserver in the middle of the shutdown. > {code} > 2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: = RegionManager.metaScanner scan of 512 row(s) of meta region {regionname: .M= ETA.,,1, startKey: <>, server: 72.34.249.218:60020} > complete > 2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: = All 1 .META. region(s) scanned > 2009-01-31 10:26:59,530 INFO org.apache.hadoop.hbase.master.BaseScanner: = RegionManager.metaScanner scanning meta region {regionname: .META.,,1, star= tKey: <>, server: 72.34.249.218:60020} > 2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: = RegionManager.metaScanner scan of 512 row(s) of meta region {regionname: .M= ETA.,,1, startKey: <>, server: 72.34.249.218:60020} > complete > 2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: = All 1 .META. region(s) scanned > 2009-01-31 10:27:40,374 INFO org.apache.hadoop.hbase.master.ServerManager= : 72.34.249.218:60020 lease expired > 2009-01-31 10:27:40,375 DEBUG org.apache.hadoop.hbase.master.HMaster: Pro= cessing todo: ProcessServerShutdown of 72.34.249.218:60020 > 2009-01-31 10:27:40,375 INFO org.apache.hadoop.hbase.master.RegionServerO= peration: process shutdown of server 72.34.249.218:60020: logSplit: false, = rootRescanned: false, numberOfMetaRegions: 1, onlin > eMetaRegions.size(): 1 > 2009-01-31 10:27:40,387 INFO org.apache.hadoop.hbase.regionserver.HLog: S= plitting 44 log(s) in hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351= _60020 > 2009-01-31 10:27:40,387 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Splitting 1 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60= 020/hlog.dat.1232996040603 > 2009-01-31 10:27:40,443 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Creating new log file writer for path hdfs://mb0:9000/hbase/.META./10287851= 92/oldlogfile.log and region .META.,,1 > 2009-01-31 10:27:40,575 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Creating new log file writer for path hdfs://mb0:9000/hbase/sources/6712251= 15/oldlogfile.log and region sources,,1229364117966 > 2009-01-31 10:27:41,171 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Applied 100003 total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_123= 2996040351_60020/hlog.dat.1232996040603 > 2009-01-31 10:27:41,173 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Splitting 2 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60= 020/hlog.dat.1233093726382 > 2009-01-31 10:27:41,429 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Creating new log file writer for path hdfs://mb0:9000/hbase/dupehash/160753= 2582/oldlogfile.log and region dupehash,O=EF=BF=BD=EF=BF=BD 31779694744 > 2009-01-31 10:27:41,462 INFO org.apache.hadoop.hbase.master.ServerManager= : 72.34.249.217:60020 lease expired > 2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.ServerManager= : All user tables quiesced. Proceeding with shutdown > 2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: telling root scanner to stop > 2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: telling meta scanner to stop > 2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: meta and root scanners notified > 2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.RootScanner: = RegionManager.rootScanner exiting > 2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.MetaScanner: = RegionManager.metaScanner exiting > 2009-01-31 10:27:41,780 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Applied 100001 total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_123= 2996040351_60020/hlog.dat.1233093726382 > 2009-01-31 10:27:41,781 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Splitting 3 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60= 020/hlog.dat.1233101015153 > 2009-01-31 10:27:41,838 INFO org.apache.hadoop.hbase.master.ServerManager= : 72.34.249.210:60020 lease expired > 2009-01-31 10:27:41,866 INFO org.apache.hadoop.hbase.master.ServerManager= : All user tables quiesced. Proceeding with shutdown > 2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: telling root scanner to stop > 2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: telling meta scanner to stop > 2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: meta and root scanners notified > 2009-01-31 10:27:42,557 INFO org.apache.hadoop.hbase.master.ServerManager= : 72.34.249.212:60020 lease expired > 2009-01-31 10:27:42,581 INFO org.apache.hadoop.hbase.master.ServerManager= : All user tables quiesced. Proceeding with shutdown > 2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: telling root scanner to stop > 2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: telling meta scanner to stop > 2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: meta and root scanners notified > 2009-01-31 10:27:42,615 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Applied 100002 total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_123= 2996040351_60020/hlog.dat.1233101015153 > 2009-01-31 10:27:42,618 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Splitting 4 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60= 020/hlog.dat.1233111791302 > 2009-01-31 10:27:43,356 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Applied 100001 total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_123= 2996040351_60020/hlog.dat.1233111791302 > 2009-01-31 10:27:43,359 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Splitting 5 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60= 020/hlog.dat.1233122447841 > 2009-01-31 10:27:43,404 INFO org.apache.hadoop.hbase.master.ServerManager= : All user tables quiesced. Proceeding with shutdown > 2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: telling root scanner to stop > 2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: telling meta scanner to stop > 2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: meta and root scanners notified > 2009-01-31 10:27:43,991 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Applied 100001 total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_123= 2996040351_60020/hlog.dat.1233122447841 > {code} > During the log replay, a log file was missing from HDFS. Not sure why, t= here was a Datanode crash that could be related. More importantly, once it= trips on the missing file it stops the replay (even though there's another= 37 logs). > {code} > 2009-01-31 10:27:43,992 DEBUG org.apache.hadoop.hbase.regionserver.HLog: = Splitting 6 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60= 020/hlog.dat.1233132556827 > 2009-01-31 10:27:44,022 WARN org.apache.hadoop.hbase.master.HMaster: Proc= essing pending operations: ProcessServerShutdown of 72.34.249.218:60020 > java.io.FileNotFoundException: File does not exist: hdfs://mb0:9000/hbase= /log_72.34.249.218_1232996040351_60020/hlog.dat.1233132556827 > at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(Dis= tributedFileSystem.java:394) > at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679) > at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.j= ava:1417) > at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.j= ava:1412) > at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:7= 42) > at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:7= 05) > at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(P= rocessServerShutdown.java:249) > at org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaste= r.java:427) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:360) > 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: telling root scanner to stop > 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: telling meta scanner to stop > 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManage= r: meta and root scanners notified > 2009-01-31 10:27:44,023 DEBUG org.apache.hadoop.hbase.RegionHistorian: Of= flined > 2009-01-31 10:27:44,023 INFO org.apache.hadoop.hbase.master.HMaster: Stop= ping infoServer > 2009-01-31 10:27:44,023 INFO org.mortbay.util.ThreadedServer: Stopping Ac= ceptor ServerSocket[addr=3D0.0.0.0/0.0.0.0,port=3D0,localport=3D60010] > 2009-01-31 10:27:44,026 INFO org.mortbay.http.SocketListener: Stopped Soc= ketListener on 0.0.0.0:60010 > {code} --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.