hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-1169) During shutdown, odd handling of regionserver hosting META
Date Sat, 31 Jan 2009 18:45:59 GMT

     [ https://issues.apache.org/jira/browse/HBASE-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Gray updated HBASE-1169:
---------------------------------

    Description: 
During shutdown of cluster, half way through quiescing servers there is a META scan in the
master.  The regions from servers whose leases are already canceled show up as invalid.  (72.34.249.208
is hosting META)


{code}
2009-01-31 10:25:42,571 INFO org.apache.hadoop.hbase.master.HMaster: Cluster shutdown requested.
Starting to quiesce servers
2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease
for 72.34.249.211:60020
2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.211:60020:
MSG_REPORT_EXITING -- lease cancelled
2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease
for 72.34.249.216:60020
2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.216:60020:
MSG_REPORT_EXITING -- lease cancelled
2009-01-31 10:25:47,840 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.210:60020
quiesced
2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease
for 72.34.249.215:60020
2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.215:60020:
MSG_REPORT_EXITING -- lease cancelled
2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease
for 72.34.249.213:60020
2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.213:60020:
MSG_REPORT_EXITING -- lease cancelled
2009-01-31 10:25:49,378 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.218:60020
quiesced
2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease
for 72.34.249.214:60020
2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.214:60020:
MSG_REPORT_EXITING -- lease cancelled
2009-01-31 10:25:59,531 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scanning meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
2009-01-31 10:25:59,544 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of activitydupehash,,1229364212541 is not valid;  Server '72.34.249.214:60020' unknown.
2009-01-31 10:25:59,545 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of api,,1229364235220 is not valid;  Server '72.34.249.216:60020' unknown.
2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of apps,,1229364222879 is not valid;  Server '72.34.249.215:60020' unknown.
2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of assigners,,1229364037757 is not valid;  Server '72.34.249.214:60020' unknown.
2009-01-31 10:25:59,554 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of canoncache,,1229364041955 is not valid;  Server '72.34.249.215:60020' unknown.
2009-01-31 10:25:59,555 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of chunks,,1229390225893 is not valid;  Server '72.34.249.211:60020' unknown.
{code}


Shutdown then continues as the last servers are quiesced, but at the same time the Master
expires the lease on the regionserver that was hosting META and that it just scanned.  It
then starts to replay the logs for that regionserver in the middle of the shutdown.


{code}
2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scan of 512 row(s) of meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
 complete
2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s)
scanned
2009-01-31 10:26:59,530 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scanning meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scan of 512 row(s) of meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
 complete
2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s)
scanned
2009-01-31 10:27:40,374 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.218:60020
lease expired
2009-01-31 10:27:40,375 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessServerShutdown
of 72.34.249.218:60020
2009-01-31 10:27:40,375 INFO org.apache.hadoop.hbase.master.RegionServerOperation: process
shutdown of server 72.34.249.218:60020: logSplit: false, rootRescanned: false, numberOfMetaRegions:
1, onlin
eMetaRegions.size(): 1
2009-01-31 10:27:40,387 INFO org.apache.hadoop.hbase.regionserver.HLog: Splitting 44 log(s)
in hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020
2009-01-31 10:27:40,387 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 1 of 44:
hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1232996040603
2009-01-31 10:27:40,443 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating new log
file writer for path hdfs://mb0:9000/hbase/.META./1028785192/oldlogfile.log and region .META.,,1
2009-01-31 10:27:40,575 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating new log
file writer for path hdfs://mb0:9000/hbase/sources/671225115/oldlogfile.log and region sources,,1229364117966
2009-01-31 10:27:41,171 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100003 total
edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1232996040603
2009-01-31 10:27:41,173 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 2 of 44:
hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233093726382
2009-01-31 10:27:41,429 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating new log
file writer for path hdfs://mb0:9000/hbase/dupehash/1607532582/oldlogfile.log and region dupehash,O��<L;h�,12
31779694744
2009-01-31 10:27:41,462 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.217:60020
lease expired
2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables
quiesced. Proceeding with shutdown
2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner
to stop
2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner
to stop
2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root
scanners notified
2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.RootScanner: RegionManager.rootScanner
exiting
2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.MetaScanner: RegionManager.metaScanner
exiting
2009-01-31 10:27:41,780 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100001 total
edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233093726382
2009-01-31 10:27:41,781 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 3 of 44:
hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233101015153
2009-01-31 10:27:41,838 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.210:60020
lease expired
2009-01-31 10:27:41,866 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables
quiesced. Proceeding with shutdown
2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner
to stop
2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner
to stop
2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root
scanners notified
2009-01-31 10:27:42,557 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.212:60020
lease expired
2009-01-31 10:27:42,581 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables
quiesced. Proceeding with shutdown
2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner
to stop
2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner
to stop
2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root
scanners notified
2009-01-31 10:27:42,615 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100002 total
edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233101015153
2009-01-31 10:27:42,618 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 4 of 44:
hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233111791302
2009-01-31 10:27:43,356 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100001 total
edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233111791302
2009-01-31 10:27:43,359 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 5 of 44:
hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233122447841
2009-01-31 10:27:43,404 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables
quiesced. Proceeding with shutdown
2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner
to stop
2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner
to stop
2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root
scanners notified
2009-01-31 10:27:43,991 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100001 total
edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233122447841
{code}


During the log replay, a log file was missing from HDFS.  Not sure why, there was a Datanode
crash that could be related.  More importantly, once it trips on the missing file it stops
the replay (even though there's another 37 logs).


{code}
2009-01-31 10:27:43,992 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 6 of 44:
hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233132556827
2009-01-31 10:27:44,022 WARN org.apache.hadoop.hbase.master.HMaster: Processing pending operations:
ProcessServerShutdown of 72.34.249.218:60020
java.io.FileNotFoundException: File does not exist: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233132556827
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:394)
        at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
        at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:742)
        at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:705)
        at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:249)
        at org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:427)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:360)
2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner
to stop
2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner
to stop
2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root
scanners notified
2009-01-31 10:27:44,023 DEBUG org.apache.hadoop.hbase.RegionHistorian: Offlined
2009-01-31 10:27:44,023 INFO org.apache.hadoop.hbase.master.HMaster: Stopping infoServer
2009-01-31 10:27:44,023 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60010]
2009-01-31 10:27:44,026 INFO org.mortbay.http.SocketListener: Stopped SocketListener on 0.0.0.0:60010
{code}


  was:
During shutdown of cluster, half way through quiescing servers there is a META scan in the
master.  The regions from servers whose leases are already canceled show up as invalid.  (72.34.249.208
is hosting META)

{code}
2009-01-31 10:25:42,571 INFO org.apache.hadoop.hbase.master.HMaster: Cluster shutdown requested.
Starting to quiesce servers
2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease
for 72.34.249.211:60020
2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.211:60020:
MSG_REPORT_EXITING -- lease cancelled
2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease
for 72.34.249.216:60020
2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.216:60020:
MSG_REPORT_EXITING -- lease cancelled
2009-01-31 10:25:47,840 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.210:60020
quiesced
2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease
for 72.34.249.215:60020
2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.215:60020:
MSG_REPORT_EXITING -- lease cancelled
2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease
for 72.34.249.213:60020
2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.213:60020:
MSG_REPORT_EXITING -- lease cancelled
2009-01-31 10:25:49,378 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.218:60020
quiesced
2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease
for 72.34.249.214:60020
2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.214:60020:
MSG_REPORT_EXITING -- lease cancelled
2009-01-31 10:25:59,531 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scanning meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
2009-01-31 10:25:59,544 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of activitydupehash,,1229364212541 is not valid;  Server '72.34.249.214:60020' unknown.
2009-01-31 10:25:59,545 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of api,,1229364235220 is not valid;  Server '72.34.249.216:60020' unknown.
2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of apps,,1229364222879 is not valid;  Server '72.34.249.215:60020' unknown.
2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of assigners,,1229364037757 is not valid;  Server '72.34.249.214:60020' unknown.
2009-01-31 10:25:59,554 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of canoncache,,1229364041955 is not valid;  Server '72.34.249.215:60020' unknown.
2009-01-31 10:25:59,555 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of chunks,,1229390225893 is not valid;  Server '72.34.249.211:60020' unknown.
{/code}

Shutdown then continues as the last servers are quiesced, but at the same time the Master
expires the lease on the regionserver that was hosting META and that it just scanned.  It
then starts to replay the logs for that regionserver in the middle of the shutdown.

{code}
2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scan of 512 row(s) of meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
 complete
2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s)
scanned
2009-01-31 10:26:59,530 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scanning meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scan of 512 row(s) of meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
 complete
2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s)
scanned
2009-01-31 10:27:40,374 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.218:60020
lease expired
2009-01-31 10:27:40,375 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessServerShutdown
of 72.34.249.218:60020
2009-01-31 10:27:40,375 INFO org.apache.hadoop.hbase.master.RegionServerOperation: process
shutdown of server 72.34.249.218:60020: logSplit: false, rootRescanned: false, numberOfMetaRegions:
1, onlin
eMetaRegions.size(): 1
2009-01-31 10:27:40,387 INFO org.apache.hadoop.hbase.regionserver.HLog: Splitting 44 log(s)
in hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020
2009-01-31 10:27:40,387 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 1 of 44:
hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1232996040603
2009-01-31 10:27:40,443 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating new log
file writer for path hdfs://mb0:9000/hbase/.META./1028785192/oldlogfile.log and region .META.,,1
2009-01-31 10:27:40,575 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating new log
file writer for path hdfs://mb0:9000/hbase/sources/671225115/oldlogfile.log and region sources,,1229364117966
2009-01-31 10:27:41,171 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100003 total
edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1232996040603
2009-01-31 10:27:41,173 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 2 of 44:
hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233093726382
2009-01-31 10:27:41,429 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating new log
file writer for path hdfs://mb0:9000/hbase/dupehash/1607532582/oldlogfile.log and region dupehash,O��<L;h�,12
31779694744
2009-01-31 10:27:41,462 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.217:60020
lease expired
2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables
quiesced. Proceeding with shutdown
2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner
to stop
2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner
to stop
2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root
scanners notified
2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.RootScanner: RegionManager.rootScanner
exiting
2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.MetaScanner: RegionManager.metaScanner
exiting
2009-01-31 10:27:41,780 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100001 total
edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233093726382
2009-01-31 10:27:41,781 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 3 of 44:
hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233101015153
2009-01-31 10:27:41,838 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.210:60020
lease expired
2009-01-31 10:27:41,866 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables
quiesced. Proceeding with shutdown
2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner
to stop
2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner
to stop
2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root
scanners notified
2009-01-31 10:27:42,557 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.212:60020
lease expired
2009-01-31 10:27:42,581 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables
quiesced. Proceeding with shutdown
2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner
to stop
2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner
to stop
2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root
scanners notified
2009-01-31 10:27:42,615 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100002 total
edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233101015153
2009-01-31 10:27:42,618 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 4 of 44:
hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233111791302
2009-01-31 10:27:43,356 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100001 total
edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233111791302
2009-01-31 10:27:43,359 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 5 of 44:
hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233122447841
2009-01-31 10:27:43,404 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables
quiesced. Proceeding with shutdown
2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner
to stop
2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner
to stop
2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root
scanners notified
2009-01-31 10:27:43,991 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100001 total
edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233122447841
{/code}

During the log replay, a log file was missing from HDFS.  Not sure why, there was a Datanode
crash that could be related.  More importantly, once it trips on the missing file it stops
the replay (even though there's another 37 logs).

{code}
2009-01-31 10:27:43,992 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 6 of 44:
hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233132556827
2009-01-31 10:27:44,022 WARN org.apache.hadoop.hbase.master.HMaster: Processing pending operations:
ProcessServerShutdown of 72.34.249.218:60020
java.io.FileNotFoundException: File does not exist: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233132556827
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:394)
        at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
        at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:742)
        at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:705)
        at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:249)
        at org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:427)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:360)
2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner
to stop
2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner
to stop
2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root
scanners notified
2009-01-31 10:27:44,023 DEBUG org.apache.hadoop.hbase.RegionHistorian: Offlined
2009-01-31 10:27:44,023 INFO org.apache.hadoop.hbase.master.HMaster: Stopping infoServer
2009-01-31 10:27:44,023 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60010]
2009-01-31 10:27:44,026 INFO org.mortbay.http.SocketListener: Stopped SocketListener on 0.0.0.0:60010
{/code}

       Priority: Critical  (was: Major)

> During shutdown, odd handling of regionserver hosting META
> ----------------------------------------------------------
>
>                 Key: HBASE-1169
>                 URL: https://issues.apache.org/jira/browse/HBASE-1169
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.19.0, 0.20.0
>            Reporter: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.19.1, 0.20.0
>
>
> During shutdown of cluster, half way through quiescing servers there is a META scan in
the master.  The regions from servers whose leases are already canceled show up as invalid.
 (72.34.249.208 is hosting META)
> {code}
> 2009-01-31 10:25:42,571 INFO org.apache.hadoop.hbase.master.HMaster: Cluster shutdown
requested. Starting to quiesce servers
> 2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling
lease for 72.34.249.211:60020
> 2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager: Region server
72.34.249.211:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling
lease for 72.34.249.216:60020
> 2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager: Region server
72.34.249.216:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:47,840 INFO org.apache.hadoop.hbase.master.ServerManager: Region server
72.34.249.210:60020 quiesced
> 2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling
lease for 72.34.249.215:60020
> 2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager: Region server
72.34.249.215:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling
lease for 72.34.249.213:60020
> 2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager: Region server
72.34.249.213:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:49,378 INFO org.apache.hadoop.hbase.master.ServerManager: Region server
72.34.249.218:60020 quiesced
> 2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling
lease for 72.34.249.214:60020
> 2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager: Region server
72.34.249.214:60020: MSG_REPORT_EXITING -- lease cancelled
> 2009-01-31 10:25:59,531 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scanning meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
> 2009-01-31 10:25:59,544 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of activitydupehash,,1229364212541 is not valid;  Server '72.34.249.214:60020' unknown.
> 2009-01-31 10:25:59,545 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of api,,1229364235220 is not valid;  Server '72.34.249.216:60020' unknown.
> 2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of apps,,1229364222879 is not valid;  Server '72.34.249.215:60020' unknown.
> 2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of assigners,,1229364037757 is not valid;  Server '72.34.249.214:60020' unknown.
> 2009-01-31 10:25:59,554 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of canoncache,,1229364041955 is not valid;  Server '72.34.249.215:60020' unknown.
> 2009-01-31 10:25:59,555 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment
of chunks,,1229390225893 is not valid;  Server '72.34.249.211:60020' unknown.
> {code}
> Shutdown then continues as the last servers are quiesced, but at the same time the Master
expires the lease on the regionserver that was hosting META and that it just scanned.  It
then starts to replay the logs for that regionserver in the middle of the shutdown.
> {code}
> 2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scan of 512 row(s) of meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
>  complete
> 2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META.
region(s) scanned
> 2009-01-31 10:26:59,530 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scanning meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
> 2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scan of 512 row(s) of meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
>  complete
> 2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META.
region(s) scanned
> 2009-01-31 10:27:40,374 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.218:60020
lease expired
> 2009-01-31 10:27:40,375 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo:
ProcessServerShutdown of 72.34.249.218:60020
> 2009-01-31 10:27:40,375 INFO org.apache.hadoop.hbase.master.RegionServerOperation: process
shutdown of server 72.34.249.218:60020: logSplit: false, rootRescanned: false, numberOfMetaRegions:
1, onlin
> eMetaRegions.size(): 1
> 2009-01-31 10:27:40,387 INFO org.apache.hadoop.hbase.regionserver.HLog: Splitting 44
log(s) in hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020
> 2009-01-31 10:27:40,387 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 1
of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1232996040603
> 2009-01-31 10:27:40,443 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating new
log file writer for path hdfs://mb0:9000/hbase/.META./1028785192/oldlogfile.log and region
.META.,,1
> 2009-01-31 10:27:40,575 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating new
log file writer for path hdfs://mb0:9000/hbase/sources/671225115/oldlogfile.log and region
sources,,1229364117966
> 2009-01-31 10:27:41,171 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100003
total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1232996040603
> 2009-01-31 10:27:41,173 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 2
of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233093726382
> 2009-01-31 10:27:41,429 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating new
log file writer for path hdfs://mb0:9000/hbase/dupehash/1607532582/oldlogfile.log and region
dupehash,O��<L;h�,12
> 31779694744
> 2009-01-31 10:27:41,462 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.217:60020
lease expired
> 2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables
quiesced. Proceeding with shutdown
> 2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root
scanner to stop
> 2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta
scanner to stop
> 2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and
root scanners notified
> 2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.RootScanner: RegionManager.rootScanner
exiting
> 2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.MetaScanner: RegionManager.metaScanner
exiting
> 2009-01-31 10:27:41,780 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100001
total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233093726382
> 2009-01-31 10:27:41,781 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 3
of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233101015153
> 2009-01-31 10:27:41,838 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.210:60020
lease expired
> 2009-01-31 10:27:41,866 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables
quiesced. Proceeding with shutdown
> 2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root
scanner to stop
> 2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta
scanner to stop
> 2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and
root scanners notified
> 2009-01-31 10:27:42,557 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.212:60020
lease expired
> 2009-01-31 10:27:42,581 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables
quiesced. Proceeding with shutdown
> 2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root
scanner to stop
> 2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta
scanner to stop
> 2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and
root scanners notified
> 2009-01-31 10:27:42,615 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100002
total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233101015153
> 2009-01-31 10:27:42,618 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 4
of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233111791302
> 2009-01-31 10:27:43,356 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100001
total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233111791302
> 2009-01-31 10:27:43,359 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 5
of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233122447841
> 2009-01-31 10:27:43,404 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables
quiesced. Proceeding with shutdown
> 2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root
scanner to stop
> 2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta
scanner to stop
> 2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and
root scanners notified
> 2009-01-31 10:27:43,991 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100001
total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233122447841
> {code}
> During the log replay, a log file was missing from HDFS.  Not sure why, there was a Datanode
crash that could be related.  More importantly, once it trips on the missing file it stops
the replay (even though there's another 37 logs).
> {code}
> 2009-01-31 10:27:43,992 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 6
of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233132556827
> 2009-01-31 10:27:44,022 WARN org.apache.hadoop.hbase.master.HMaster: Processing pending
operations: ProcessServerShutdown of 72.34.249.218:60020
> java.io.FileNotFoundException: File does not exist: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233132556827
>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:394)
>         at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>         at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:742)
>         at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:705)
>         at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:249)
>         at org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:427)
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:360)
> 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root
scanner to stop
> 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta
scanner to stop
> 2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and
root scanners notified
> 2009-01-31 10:27:44,023 DEBUG org.apache.hadoop.hbase.RegionHistorian: Offlined
> 2009-01-31 10:27:44,023 INFO org.apache.hadoop.hbase.master.HMaster: Stopping infoServer
> 2009-01-31 10:27:44,023 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60010]
> 2009-01-31 10:27:44,026 INFO org.mortbay.http.SocketListener: Stopped SocketListener
on 0.0.0.0:60010
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message