hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: RegionServer shutdown - how to recover?
Date Mon, 27 Sep 2010 18:25:09 GMT
If an old recovered.edits was already there from a previous failure
from before your upgrade on Sept. 5th then it's sufficient to explain
the problem you encountered.

J-D

On Sun, Sep 26, 2010 at 8:09 AM, Matthew LeMieux <mdl@mlogiciels.com> wrote:
> I took a look at the bug (https://issues.apache.org/jira/browse/HBASE-3027).
>
> The issue I'm experiencing does not appear to be a result of deploying a new version.
  I've been on the same version since Sept 5th.
>
> -Matthew
>
> On Sep 22, 2010, at 9:01 PM, Stack wrote:
>
>> Could it be HBASE-3027? (Did you upgrade recently?)
>> St.Ack
>>
>> On Wed, Sep 22, 2010 at 8:21 PM, Matthew LeMieux <mdl@mlogiciels.com> wrote:
>>> That would be an awfully long GC pause.  GC logging was not enabled on that
particular machine (it is now).
>>>
>>> The only exception in that part of the log file, "java.io.FileNotFoundException:
Parent path is not a directory" occurred quite a few times, although HDFS seems to think of
itself as being healthy.
>>>
>>>
>>> The master did some of this for a while:
>>>
>>> 2010-09-22 22:06:26,563 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog:
Split writer thread for region user,\x00b\x7Ft,1281914827925.9c90ade2d19b8ce0b1613982eba2a673.
got 1 to process
>>> 2010-09-22 22:06:26,580 INFO org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter:
Using syncFs -- HDFS-200
>>> 2010-09-22 22:06:26,580 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog:
Creating writer path=hdfs://domU-12-31-39-18-12-05.compute-1.internal:9000/hbase/user/08b0e0f67fde365167627ad6fcbaa7b4/recovered.edits/0000000000077355008
region=user,\x00W\x0D@,1281914368529.08b0e0f67fde365167627ad6fcbaa7b4.
>>> 2010-09-22 22:06:26,580 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog:
Split writer thread for region user,\x00W\x0D@,1281914368529.08b0e0f67fde365167627ad6fcbaa7b4.
Applied 1 total edits to user,\x00W\x0D@,1281914368529.08b0e0f67fde365167627ad6fcbaa7b4. in
40ms
>>>
>>> That was mixed in with several of these:
>>>
>>> 2010-09-22 22:06:26,922 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog:
Split writer thread for region user,\x01\xCB\xA8\x95,1281914501571.99a18a4a45748deb69fc6d7c804ffbe7.
Got while writing log entry to log
>>> java.io.IOException: cannot get log writer
>>>        at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:551)
>>>        at org.apache.hadoop.hbase.regionserver.wal.HLog$1.call(HLog.java:1610)
>>>        at org.apache.hadoop.hbase.regionserver.wal.HLog$1.call(HLog.java:1586)
>>>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>        at java.lang.Thread.run(Thread.java:619)
>>> Caused by: org.apache.hadoop.ipc.RemoteException: java.io.FileNotFoundException:
Parent path is not a directory: /hbase/user/99a18a4a45748deb69fc6d7c804ffbe7/recovered.edits
>>>        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:933)
>>>        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addFile(FSDirectory.java:154)
>>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1153)
>>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1002)
>>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:407)
>>>        at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
>>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
>>>
>>> And then a bunch of these:
>>>
>>> 2010-09-22 22:06:28,482 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog:
Closed hdfs://domU-12-31-39-18-12-05.compute-1.internal:9000/hbase/.META./1028785192/recovered.edits/0000000000077298364
>>> 2010-09-22 22:06:28,515 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog:
Closed hdfs://domU-12-31-39-18-12-05.compute-1.internal:9000/hbase/counters/692b9f45969bcc7ae2774b0e5fc17b12/recovered.edits/0000000000077298445
>>> 2010-09-22 22:06:29,431 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog:
Closed hdfs://domU-12-31-39-18-12-05.compute-1.internal:9000/hbase/user/9097ba3b5f7b255e19861dc2131c8477/recovered.edits/0000000000077298636
>>> 2010-09-22 22:06:29,448 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog:
Closed hdfs://domU-12-31-39-18-12-05.compute-1.internal:9000/hbase/user/dbc3619ce88ac8153334fb5f581c1575/recovered.edits/0000000000077298637
>>> 2010-09-22 22:06:29,464 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog:
Closed hdfs://domU-12-31-39-18-12-05.compute-1.internal:9000/hbase/user/0f03f4ef25d647784ce69cd1e354af94/recovered.edits/0000000000077298666
>>> ...
>>> 2010-09-22 22:06:30,614 WARN org.apache.hadoop.hbase.master.RegionServerOperationQueue:
Failed processing: ProcessServerShutdown of domU-12-31-39-18-1B-05.compute-1.internal,60020,1284840302559;
putting onto delayed todo queue
>>> java.io.IOException: java.io.IOException: cannot get log writer
>>>        at org.apache.hadoop.hbase.regionserver.wal.HLog.writeEditsBatchToRegions(HLog.java:1510)
>>>        at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1335)
>>>        at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1210)
>>>        at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:299)
>>>        at org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:147)
>>>        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:532)
>>> Caused by: java.io.IOException: cannot get log writer
>>>        at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:551)
>>>        at org.apache.hadoop.hbase.regionserver.wal.HLog$1.call(HLog.java:1610)
>>>        at org.apache.hadoop.hbase.regionserver.wal.HLog$1.call(HLog.java:1586)
>>>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>        at java.lang.Thread.run(Thread.java:619)
>>> Caused by: org.apache.hadoop.ipc.RemoteException: java.io.FileNotFoundException:
Parent path is not a directory: /hbase/user/2d225ad65207b0ab51b4604a9f889773/recovered.edits
>>>        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:933)
>>>        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addFile(FSDirectory.java:154)
>>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1153)
>>>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1002)
>>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:407)
>>>        at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
>>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
>>>
>>> There were also quite a few messages that indicate the split was unsuccessful:
>>>
>>> 2010-09-22 22:06:45,430 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog:
Split writer thread for region user,\x00\x14w\xB0,1284527871890.752e96713b4408d77b750af06d0e2bad.
got 57 to process
>>> 2010-09-22 22:06:45,432 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Found
existing old edits file. It could be the result of a previous failed split attempt. Deleting
hdfs://domU-12-31-39-18-12-05.compute-1.internal:9000/hbase/user/752e96713b4408d77b750af06d0e2bad/recovered.edits/0000000000077298635,
length=200866
>>>
>>> Does this point to a particular problem?
>>>
>>>
>>> Matthew
>>>
>>>
>>> On Sep 22, 2010, at 3:52 PM, Jean-Daniel Cryans wrote:
>>>
>>>> It seem that the log splitting didn't succeed (can't tell why, small
>>>> log snippet is too small). Did it get stuck on EOF or something like
>>>> that? In any case, it looks like a bug.
>>>>
>>>>>   * how do I avoid disruption when this sort of thing happens
>>>>
>>>> Your region server was unavailable for 3 minutes, probably a GC pause
>>>> (did you enable GC logging?), so the master splits the logs after it
>>>> was notified via a ZK watcher. The timeout is 1 minute, you can set
>>>> that lower if you want to answer to region server failure faster (but
>>>> be more affected by shorter GC pauses or similar). You can also lower
>>>> the recovery time by setting a smaller hbase.regionserver.maxlogs
>>>> (which is 32 by default), at the expense of flushing a lot more.
>>>>
>>>>>   * what is the best procedure for recovering from such a thing (i.e.,
is there something to be done other than simply restarting the region server?)
>>>>
>>>> It's fatal if the master isn't able to complete log splitting
>>>> properly, so restarting is your best bet until we can figure how you
>>>> got stuck there.
>>>>
>>>> J-D
>>>>
>>>> On Wed, Sep 22, 2010 at 3:41 PM, Matthew LeMieux <mdl@mlogiciels.com>
wrote:
>>>>> I've just had another of those region server shutdowns (RegionServerSuicide).
>>>>>
>>>>>   * how do I avoid disruption when this sort of thing happens
>>>>>   * what is the best procedure for recovering from such a thing (i.e.,
is there something to be done other than simply restarting the region server?)
>>>>>
>>>>> I've heard others talk of this event as being easy to recover from.  I'd
like to understand how I can get into a configuration where that is the case.  Currently,
an RS shutdown causes significant disruption.  Clients are unable to query HBase.   Simply
restarting the region server doesn't improve things.  In every case, I've needed to do a
complete restart of the entire cluster which isn't exactly super fast.  A single region server
shutdown means a minimum of 1/2 hour down time, and sometimes that is more like hours depending
on the reason for the RS shutdown (once it took an entire day).  Is the experience similarly
painful for others?
>>>>>
>>>>> Common error messages on the client are: java.lang.RuntimeException:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to domU-12-31-39-18-1B-05.compute-1.internal/10.104.55.247:60020
after attempts=1
>>>>> ... and ... the RS complains with a "NotServingRegion" exception.
>>>>>
>>>>> As you can see form the selected messages below, after I restart the
affected RS, the cluster is still unusable, and gives no indication that this state still
change.
>>>>>
>>>>> It is of course no problem just restart a region server if it goes down
(either with a script or manually), the real question is how do I avoid downtime?
>>>>>
>>>>> Details are below...
>>>>>
>>>>> Thank you for your help,
>>>>>
>>>>> Matthew
>>>>>
>>>>> I'm running in EC2 on DFS from CDH3b2, using a version of HBase from
20100830.
>>>>>
>>>>> START OF THE SHUTDOWN ON AFFECTED RS:
>>>>>
>>>>> 2010-09-22 22:00:39,887 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Major compaction triggered on store pg_payload; time since last major compaction 93578344ms
>>>>> 2010-09-22 22:00:39,887 INFO org.apache.hadoop.hbase.regionserver.Store:
Started compaction of 1 file(s) in pg_payload of userh,\x1D\xAA2f\xE0\x1F\x12:\xA4\x98r\xFF!\xB8\x1A`,1285083827570.6be256da616fc20a6ac78a41952104e8.
 into hdfs://domU-12-31-39-18-12-05.compute-1.internal:9000/hbase/userh/6be256da616fc20a6ac78a41952104e8/.tmp,
sequenceid=77142618
>>>>> 2010-09-22 22:03:23,317 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache:
LRU Stats: total=1.57 GB, free=396.14 MB, max=1.95 GB, blocks=27819, accesses=38513233, hits=14159798,
hitRatio=36.76%%, evictions=6494, evicted=20462105, evictedPerRun=3150.924560546875
>>>>> 2010-09-22 22:03:23,361 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 181390ms for sessionid 0x12b266801550065,
closing socket connection and attempting reconnect
>>>>> 2010-09-22 22:03:23,363 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 183256ms for sessionid 0x12b266801550002,
closing socket connection and attempting reconnect
>>>>> 2010-09-22 22:03:23,451 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer:
Aborting region server serverName=domU-12-31-39-18-1B-05.compute-1.internal,60020,1284840302559,
load=(requests=4458, regions=134, usedHeap=8872, maxHeap=9995): Unhandled exception
>>>>> org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadException:
Server REPORT rejected; currently processing domU-12-31-39-18-1B-05.compute-1.internal,60020,1284840302559
as dead server
>>>>>        at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:217)
>>>>>        at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:271)
>>>>>        at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:753)
>>>>>        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>>>>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
>>>>>        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:919)
>>>>>
>>>>> THE RESTART ON THE AFFECTED RS:
>>>>>
>>>>> 2010-09-22 22:10:04,534 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
regionserver60020 exiting
>>>>> 2010-09-22 22:13:23,449 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
Scanner 4628396768048238482 lease expired
>>>>> 2010-09-22 22:13:23,449 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
Scanner 891312872292456529 lease expired
>>>>> 2010-09-22 22:13:23,449 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
Scanner 2977737069391109815 lease expired
>>>>> 2010-09-22 22:13:23,450 INFO org.apache.hadoop.hbase.Leases: regionserver60020.leaseChecker
closing leases
>>>>> 2010-09-22 22:13:23,450 INFO org.apache.hadoop.hbase.Leases: regionserver60020.leaseChecker
closed leases
>>>>> 2010-09-22 22:13:23,450 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook:
Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-9,5,main]
>>>>> 2010-09-22 22:13:23,451 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook:
Starting fs shutdown hook thread.
>>>>> 2010-09-22 22:13:23,452 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook:
Shutdown hook finished.
>>>>> Wed Sep 22 22:15:11 UTC 2010 Starting regionserver on domU-12-31-39-18-1B-05
>>>>> 2010-09-22 22:15:12,716 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
vmInputArguments=[-Xmx768m, -XX:+HeapDumpOnOutOfMemoryError, -XX:+UseConcMarkSweepGC, -XX:+CMSIncrementalMode,
-XX:+HeapDumpOnOutOfMemoryError, -XX:+UseConcMarkSweepGC, -XX:+CMSIncrementalMode, -Xmx10000m,
-Dhbase.log.dir=/mnt/tmp/hbaselog, -Dhbase.log.file=hbase-mlcamus-regionserver-domU-12-31-39-18-1B-05.log,
-Dhbase.home.dir=/home/mlcamus/apps/hbase/bin/.., -Dhbase.id.str=mlcamus, -Dhbase.root.logger=INFO,DRFA]
>>>>> ...
>>>>> 2010-09-22 22:25:03,530 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> 2010-09-22 22:25:04,288 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> 2010-09-22 22:25:05,299 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> 2010-09-22 22:25:05,348 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> 2010-09-22 22:25:06,308 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> 2010-09-22 22:25:06,358 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> 2010-09-22 22:25:07,368 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> 2010-09-22 22:25:07,545 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> 2010-09-22 22:25:08,318 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> 2010-09-22 22:25:08,377 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> 2010-09-22 22:25:08,548 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> 2010-09-22 22:25:08,552 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; user,^@^Tj�,1284527871890.bfee076d62bd372a70299f5854517145.
>>>>> 2010-09-22 22:25:09,288 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> 2010-09-22 22:25:09,558 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> 2010-09-22 22:25:09,698 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; .META.,,1
>>>>> ... (these messages continue)
>>>>>
>>>>> ON THE MASTER WHEN THE RS WAS SHUTTING DOWN:
>>>>>
>>>>> 2010-09-22 21:59:51,588 INFO org.apache.hadoop.hbase.master.BaseScanner:
All 1 .META. region(s) scanned
>>>>> 2010-09-22 22:00:22,931 INFO org.apache.hadoop.hbase.master.ServerManager:
2 region servers, 0 dead, average load 134.0
>>>>> 2010-09-22 22:00:39,331 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {server: 10.104.37.247:60020, regionname: -ROOT-,,0.70236052,
startKey: <>}
>>>>> 2010-09-22 22:00:39,333 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {server: 10.104.37.247:60020, regionname:
-ROOT-,,0.70236052, startKey: <>} complete
>>>>> 2010-09-22 22:00:51,368 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {server: 10.104.55.247:60020, regionname: .META.,,1.1028785192,
startKey: <>}
>>>>> 2010-09-22 22:00:51,801 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 266 row(s) of meta region {server: 10.104.55.247:60020,
regionname: .META.,,1.1028785192, startKey: <>} complete
>>>>> 2010-09-22 22:00:51,801 INFO org.apache.hadoop.hbase.master.BaseScanner:
All 1 .META. region(s) scanned
>>>>> 2010-09-22 22:01:22,941 INFO org.apache.hadoop.hbase.master.ServerManager:
2 region servers, 0 dead, average load 134.0
>>>>> 2010-09-22 22:01:39,341 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {server: 10.104.37.247:60020, regionname: -ROOT-,,0.70236052,
startKey: <>}
>>>>> 2010-09-22 22:01:39,343 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {server: 10.104.37.247:60020, regionname:
-ROOT-,,0.70236052, startKey: <>} complete
>>>>> 2010-09-22 22:01:51,371 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {server: 10.104.55.247:60020, regionname: .META.,,1.1028785192,
startKey: <>}
>>>>> 2010-09-22 22:02:22,951 INFO org.apache.hadoop.hbase.master.ServerManager:
2 region servers, 0 dead, average load 134.0
>>>>> 2010-09-22 22:02:24,203 INFO org.apache.hadoop.hbase.master.ServerManager:
domU-12-31-39-18-1B-05.compute-1.internal,60020,1284840302559 znode expired
>>>>> 2010-09-22 22:02:24,204 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Added=domU-12-31-39-18-1B-05.compute-1.internal,60020,1284840302559 to dead servers, added
shutdown processing operation
>>>>> 2010-09-22 22:02:24,204 INFO org.apache.hadoop.hbase.master.RegionManager:
META region whose startkey is  removed from onlineMetaRegions
>>>>> 2010-09-22 22:02:24,204 DEBUG org.apache.hadoop.hbase.master.RegionServerOperationQueue:
Processing todo: ProcessServerShutdown of domU-12-31-39-18-1B-05.compute-1.internal,60020,1284840302559
>>>>> 2010-09-22 22:02:24,204 INFO org.apache.hadoop.hbase.master.RegionServerOperation:
Process shutdown of server domU-12-31-39-18-1B-05.compute-1.internal,60020,1284840302559:
logSplit: false, rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
>>>>> 2010-09-22 22:02:24,242 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Splitting 35 hlog(s) in hdfs://domU-12-31-39-18-12-05.compute-1.internal:9000/hbase/.logs/domU-12-31-39-18-1B-05.compute-1.internal,60020,1284840302559
>>>>>
>>>>> ON THE MASTER AFTER GIVING A FEW MINUTES TO RECOVER:
>>>>>
>>>>> 2010-09-22 22:31:00,300 DEBUG org.apache.hadoop.hbase.master.RegionServerOperation:
numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
>>>>> 2010-09-22 22:31:00,300 DEBUG org.apache.hadoop.hbase.master.RegionServerOperation:
Requeuing because not all meta regions are online
>>>>> 2010-09-22 22:31:00,301 DEBUG org.apache.hadoop.hbase.master.RegionServerOperationQueue:
Processing todo: ProcessRegionClose of userh,R\x80\xBD\x0F\xCB\xEAf\x95"+\xE1\x9A)/\xD7\x19,1285081795205.bf1a30fb67c1b0ea421da1f458d83c44.,
false, reassign: true
>>>>> 2010-09-22 22:31:00,301 DEBUG org.apache.hadoop.hbase.master.RegionServerOperation:
numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
>>>>> 2010-09-22 22:31:00,301 DEBUG org.apache.hadoop.hbase.master.RegionServerOperation:
Requeuing because not all meta regions are online
>>>>> 2010-09-22 22:31:10,310 DEBUG org.apache.hadoop.hbase.master.RegionServerOperationQueue:
Processing todo: ProcessRegionClose of userh,R\x80\xBD\x0F\xCB\xEAf\x95"+\xE1\x9A)/\xD7\x19,1285081795205.bf1a30fb67c1b0ea421da1f458d83c44.,
false, reassign: true
>>>>> 2010-09-22 22:31:10,310 DEBUG org.apache.hadoop.hbase.master.RegionServerOperation:
numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
>>>>> 2010-09-22 22:31:10,310 DEBUG org.apache.hadoop.hbase.master.RegionServerOperation:
Requeuing because not all meta regions are online
>>>>> 2010-09-22 22:31:10,311 DEBUG org.apache.hadoop.hbase.master.RegionServerOperationQueue:
Processing todo: ProcessRegionClose of userh,\xE5\x1A\x93\xCEQ\xC9\x8E\x8DIlvh9\xE1\xC8O,1285067882016.8aecae7491b95dd76aefab0058a71cff.,
false, reassign: true
>>>>> 2010-09-22 22:31:10,311 DEBUG org.apache.hadoop.hbase.master.RegionServerOperation:
numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
>>>>> 2010-09-22 22:31:10,311 DEBUG org.apache.hadoop.hbase.master.RegionServerOperation:
Requeuing because not all meta regions are online
>>>>> 2010-09-22 22:31:20,320 DEBUG org.apache.hadoop.hbase.master.RegionServerOperationQueue:
Processing todo: ProcessRegionClose of userh,\xE5\x1A\x93\xCEQ\xC9\x8E\x8DIlvh9\xE1\xC8O,1285067882016.8aecae7491b95dd76aefab00
>>>>>
>>>>> WHEN RUNNING "list" FROM "hbase shell":
>>>>>
>>>>> org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException:
.META.,,1
>>>>>        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2217)
>>>>>        at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1808)
>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
>>>>>        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:919)
>>>>>
>>>>>
>>>>>
>>>
>>>
>
>

Mime
View raw message