hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Boyle <steve.bo...@connexity.com>
Subject FW: recovering from a region server failure
Date Mon, 14 May 2012 20:59:52 GMT
Following up on a discussion that started in asynchbase group...

It looks like our RS failure was related to a ZooKeeper timeout, seems we may have overloaded
that RS.  The cause of the failure is not as important to me right now as our ability to recover
from the failure.  To answer some of Stack's questions:

We are running hbase version hbase-0.90.4-cdh3u3.

We are running hadoop version hadoop-0.20.2-cdh3u3.

Anything particular about this 'OPENING' region?  If you trace it in the master log, anything
anomalous going on (it was being balanced when the regionserver it was on crashed or something
like that)?

The anomalous thing seems to be this:
2012-05-12 00:58:14,325 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed
path hdfs://db1/hbase/PROD_rDay/9d0f42dd7d703c8ae029e31ac7f13f44/recovered.edits/0000000017087735812.temp
(wrote 3015517 edits in 34450ms)

There were only 3 tables/regions with a large number of edits - all 3 of them contain only
Atomic Increments.  This is the only region that had over 3M edits, the other 2 similar regions
(each for a different table that holds similar data) had < 825K edits.  All other tables/regions
this server opened had < 10K edits.  Not clear why this one region had so many edits to
write, one of the other Atomic Increment only tables gets more rows written to it and a has
the same write rate/flush interval in the client code (we coalesce atomic increments into
batches and write the batches out on a timer).


Sounds like something up w/ this particular region.  When its opening on another regionserver,
whats showing in the regionserver logs?

Looks like it is replaying logs and the ZK lease times out during that time:
2012-05-12 00:58:28,767 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore
flush for PROD_rDay,m:669:20111104,1323489256750.9d0f42dd7d703c8ae029e31ac7f13f44., current
region memstore size 128.0m; wal is null, using passed sequenceid=17087861627
2012-05-12 00:58:28,767 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting,
commencing flushing stores
2012-05-12 00:58:30,054 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed
file at hdfs://db1/hbase/PROD_rHour/fe15757724e007251016b638ab55f151/.tmp/3732133664174711660
to hdfs://db1/hbase/PROD_rHour/fe15757724e007251016b638ab55f151/C/5986787969762915560
2012-05-12 00:58:30,057 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://db1/hbase/PROD_rHour/fe15757724e007251016b638ab55f151/C/5986787969762915560,
entries=579194, sequenceid=17091885583, memsize=128.0m, filesize=5.4m
2012-05-12 00:58:30,057 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore
flush of ~128.0m for region PROD_rHour,,1323484845768.fe15757724e007251016b638ab55f151. in
3976ms, sequenceid=17091885583, compaction requested=false; wal=null
2012-05-12 00:58:32,982 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed
file at hdfs://db1/hbase/PROD_rDay/9d0f42dd7d703c8ae029e31ac7f13f44/.tmp/2211059944788521319
to hdfs://db1/hbase/PROD_rDay/9d0f42dd7d703c8ae029e31ac7f13f44/C/414592015524271636
2012-05-12 00:58:32,985 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://db1/hbase/PROD_rDay/9d0f42dd7d703c8ae029e31ac7f13f44/C/414592015524271636,
entries=573364, sequenceid=17087861627, memsize=128.0m, filesize=6.1m
2012-05-12 00:58:32,985 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore
flush of ~128.0m for region PROD_rDay,m:669:20111104,1323489256750.9d0f42dd7d703c8ae029e31ac7f13f44.
in 4218ms, sequenceid=17087861627, compaction requested=true; wal=null
2012-05-12 00:58:34,722 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore
flush for PROD_rHour,,1323484845768.fe15757724e007251016b638ab55f151., current region memstore
size 128.0m; wal is null, using passed sequenceid=17092246834
2012-05-12 00:58:34,722 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting,
commencing flushing stores
2012-05-12 00:58:39,260 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore
flush for PROD_rDay,m:669:20111104,1323489256750.9d0f42dd7d703c8ae029e31ac7f13f44., current
region memstore size 128.0m; wal is null, using passed sequenceid=17087925859

2012-05-12 00:58:44,807 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed
file at hdfs://db1/hbase/PROD_rDay/9d0f42dd7d703c8ae029e31ac7f13f44/.tmp/6354552965813660275
to hdfs://db1/hbase/PROD_rDay/9d0f42dd7d703c8ae029e31ac7f13f44/C/6008227162161883094
2012-05-12 00:58:44,810 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://db1/hbase/PROD_rDay/9d0f42dd7d703c8ae029e31ac7f13f44/C/6008227162161883094,
entries=572482, sequenceid=17087925859, memsize=128.0m, filesize=6.1m
2012-05-12 00:58:44,810 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore
flush of ~128.0m for region PROD_rDay,m:669:20111104,1323489256750.9d0f42dd7d703c8ae029e31ac7f13f44.
in 5550ms, sequenceid=17087925859, compaction requested=true; wal=null
2012-05-12 00:58:46,708 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Applied 1633459,
skipped 554562, firstSequenceidInLog=17087735911, maxSequenceidInLog=17092544014

memstore flushing continutes for some time... until...

2012-05-12 01:05:19,756 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore
flush for PROD_rDay,m:669:20111104,1323489256750.9d0f42dd7d703c8ae029e31ac7f13f44., current
region memstore size 128.0m; wal is null, using passed sequenceid=17090407726
2012-05-12 01:05:19,756 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting,
commencing flushing stores
2012-05-12 01:05:23,547 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Failed open of region=PROD_rDay,m:669:20111104,1323489256750.9d0f42dd7d703c8ae029e31ac7f13f44.
org.apache.hadoop.hbase.DroppedSnapshotException: region: PROD_rDay,m:669:20111104,1323489256750.9d0f42dd7d703c8ae029e31ac7f13f44.
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1055)
        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2042)
        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1917)
        at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:367)
        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2770)
        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2756)
        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:312)
        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:99)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
No lease on /hbase/PROD_rDay/9d0f42dd7d703c8ae029e31ac7f13f44/.tmp/66369361
75557646629 File does not exist. [Lease.  Holder: DFSClient_hb_rs_hb5,60020,1336672254585_1336672255146,
pendingcreates: 2]
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1593)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1584)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1639)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1627)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:687)
        at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)

        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
        at $Proxy5.complete(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy5.complete(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3941)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3856)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
        at org.apache.hadoop.hbase.io.hfile.HFile$Writer.close(HFile.java:655)
        at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:877)
        at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:539)
        at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:494)
        at org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
        at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1573)
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1033)
        ... 11 more
2012-05-12 01:05:23,547 INFO org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Opening of region REGION => {NAME => 'PROD_rDay,m:669:20111104,1323489256750.9d0f42dd7d703c8ae029e31ac7f13f44.',
STARTKEY => 'm:669:20111104', ENDKEY => '', ENCODED => 9d0f42dd7d703c8ae029e31ac7f13f44,
TABLE => {{NAME => 'PROD_rDay', FAMILIES => [{NAME => 'C', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'LZO', TTL => '2147483647',
BLOCKSIZE => '65536', IN_MEMORY => 'true', BLOCKCACHE => 'true'}]}} failed, marking
as FAILED_OPEN in ZK
2012-05-12 01:05:23,547 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x23660ab0d427043
Attempting to transition node 9d0f42dd7d703c8ae029e31ac7f13f44 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_FAILED_OPEN
2012-05-12 01:05:23,547 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x23660ab0d427043
Attempt to transition the unassigned node for 9d0f42dd7d703c8ae029e31ac7f13f44 from RS_ZK_REGION_OPENING
to RS_ZK_REGION_FAILED_OPEN failed, the node existed but was version 7 not the expected version
5
2012-05-12 01:05:23,547 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Unable to mark region REGION => {NAME => 'PROD_rDay,m:669:20111104,1323489256750.9d0f42dd7d703c8ae029e31ac7f13f44.',
STARTKEY => 'm:669:20111104', ENDKEY => '', ENCODED => 9d0f42dd7d703c8ae029e31ac7f13f44,
TABLE => {{NAME => 'PROD_rDay', FAMILIES => [{NAME => 'C', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'LZO', TTL => '2147483647',
BLOCKSIZE => '65536', IN_MEMORY => 'true', BLOCKCACHE => 'true'}]}} as FAILED_OPEN.
It's likely that the master already timed out this open attempt, and thus another RS already
has the region.


I'm not clear why this particular region had so many edits to write compared to the other
regions that were running on the failed server.  It seems that flushing the memstore is taking
a long time and allowing the lease for the region to timeout.  What can be done about that?

Per the history below, once we bring the failed RS back online, all the issues seem to clear
up and things start working as expected again.  I'm not clear on what the failed RS is doing
at startup to clear the issue up.  I'd like to get to a point where the failed RS can stay
down indefinitely and all of the regions come back up on other RSs without intervention.

Thanks,
Steve


-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Monday, May 07, 2012 1:55 PM
To: Steve Boyle
Subject: Re: recovering from a region server failure

On Mon, May 7, 2012 at 11:09 AM, Steve Boyle <steve.boyle@connexity.com> wrote:
> We experienced our Region Server failure today and I'm hoping I can get your opinion
on what we saw.  Everything was running fine then we have a Region Server crash (not sure
why it crashed at the moment).

It'll say why in the logs.  Ou might have to read up from the last log message a page or two
to get to the root reason (look for stuff like expired zookeeper session -- thats a common
one).

> All of the regions that the failed RS was serving fail over to other RSs pretty quickly
(quick enough for our purposes, say in less than a minute), except for a single region.  One
region seems to get stuck in the 'opening' state on another, running, RS.

What version of hbase? (I went back over this correspondence and I don't think you say)

Anything particular about this 'OPENING' region?  If you trace it in the master log, anything
anomalous going on (it was being balanced when the regionserver it was on crashed or something
like that)?
Master log is noisy but you can usually figure basics w/ a grep on region name.


> We go into the hbase shell and try to 'unassign' the region (with the force flag), this
causes the region to get stuck in 'opening' on yet another region server.  We try to 'unassign'
the region several times after is does not open on a given server for about 30-45 seconds,
it does not seem to open at all.

Sounds like something up w/ this particular region.  When its opening on another regionserver,
whats showing in the regionserver logs?

> Somewhere in this process, another person restarts the failed RS - just as soon as the
failed RS is restarted, the region that is stuck in 'opening' is able to open/start on whatever
region server it was last waiting to open on.  It seems odd that we need to restart the failed
RS to get all of the regions it used to be serving back online.  Any suggestions as to how
we can better understand or even solve this problem?
>

Hmm..  What version of hadoop?  Are you running an appropriate hadoop?
 See http://hbase.apache.org/book.html#hadoop (I'm guessing that if you looked in master logs
you'd see complaint that master is unable to split WAL logs because lease is still held by
the regionserver that 'crashed'.... this would happen if you were running an hadoop that didn't
support sync and that was missing the steal-lease api necessary to hbase smooth sailing).

We should also probably do any continuation over on hbase lists Steve rather than here on
the async list since it seems you are having hbase rather than asynchbase issues

Go easy,
St.Ack

Mime
View raw message