hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Expired Scanner Lease == RegionServer death ?
Date Thu, 26 Nov 2009 20:59:28 GMT
Hey,

Looking at the log below, it looks like you have run into a Zookeeper
session expiry event.  If a RS loses it's ZK lease, it must give up,
since the regions it has open have probably been reassigned already,
and dual assignment Is Very Bad.  This is probably a GC related pause,
this line in particular is indicative:

>  2009-11-26 01:53:09,867 WARN org.apache.hadoop.hbase.util.Sleeper: We
> slept 72462ms, ten times longer than scheduled: 3000

In the log somewhere there should be a "session expired" zookeeper
message.  I'm not seeing it in the snippets you post.

There are several potential problems here:
- You need to be using the CMS GC. This is very important.
- If you are hitting your cluster hard with a job and maxing out the
IOW, you might be starving the ZK quorum of the IO bandwidth it needs
to do operations.

Can you tell us more about how you are running things?  What JVM, GC
config, node count, node spec, etc.

-ryan

On Thu, Nov 26, 2009 at 8:30 AM, Joost Ouwerkerk <joost@openplaces.org> wrote:
> I'm experiencing a scenario whereby long-running MapReduce tasks appear to
> be taking RegionServers down, wondering if
>  1) anyone else is experiencing this
>  2) this is expected behaviour, and if so,
>  3) what I can do about it.
>
> Scenario looks something like this (although I may be misinterpreting):
>
> 1.TableMapper task has not completed after about one minute
>
> 2. RegionServer reports:
>  2009-11-26 01:53:09,867 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> 637450415407903410 lease expired
>  2009-11-26 01:53:09,902 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> -2878823706426617596 lease expired
>  2009-11-26 01:53:09,902 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> -4783927291358135864 lease expired
>  2009-11-26 01:53:09,902 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> 5282836853577726911 lease expired
>
> Followed By:
>  2009-11-26 01:53:09,882 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x252df4a91a0005 to sun.nio.ch.SelectionKeyImpl@49e808ca
>    java.io.IOException: TIMED OUT
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
>  2009-11-26 01:53:09,867 WARN org.apache.hadoop.hbase.util.Sleeper: We
> slept 72462ms, ten times longer than scheduled: 3000
>  2009-11-26 01:53:09,918 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer:
>    org.apache.hadoop.hbase.UnknownScannerException: Name:
> 5282836853577726911
>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1925)
>
> 3. Task Fails. (normal behaviour)
>
> 4. RegionServer starts closing its regions:
>  2009-11-26 01:53:10,035 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> state: Disconnected, type: None, path: null
>  2009-11-26 01:53:10,104 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_CALL_SERVER_STARTUP
>  2009-11-26 01:53:10,187 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed crawled_pages, ...
>  2009-11-26 01:53:11,653 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed crawled_pages, ...
>  2009-11-26 01:53:14,184 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed crawled_pages, ...
>  2009-11-26 01:53:18,567 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed crawled_pages, ...
>  2009-11-26 01:53:19,147 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed crawled_pages, ...
>  2009-11-26 01:53:21,265 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed crawled_pages, ...
>  2009-11-26 01:53:45,112 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed crawled_pages, ...
>
> 5. RegionServer shuts down:
>  2009-11-26 01:53:12,425 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> server on 60020
>  2009-11-26 01:53:12,425 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 1 on 60020: exiting
>  2009-11-26 01:53:12,425 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 0 on 60020: exiting
>  2009-11-26 01:53:12,425 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 2 on 60020: exiting
>  2009-11-26 01:53:12,443 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
>  2009-11-26 01:53:12,526 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> IPC Server Responder
>  2009-11-26 01:53:13,169 INFO
> org.apache.hadoop.hbase.regionserver.LogFlusher:
> regionserver/192.168.1.111:60020.logFlusher exiting
>  2009-11-26 01:53:13,170 INFO
> org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.
>  2009-11-26 01:53:13,170 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
> regionserver/192.168.1.111:60020.cacheFlusher exiting
>  2009-11-26 01:53:13,170 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker:
> regionserver/192.168.1.111:60020.majorCompactionChecker exiting
>  2009-11-26 01:53:13,169 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> regionserver/192.168.1.111:60020.compactor exiting
>  2009-11-26 01:53:13,449 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: On abort, closed hlog
>  2009-11-26 01:53:13,450 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
> 192.168.1.111:60020
>  2009-11-26 01:53:19,867 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
>  2009-11-26 01:53:19,867 INFO org.apache.zookeeper.ZooKeeper: Closing
> session: 0x252df4a91a0005
>  2009-11-26 01:53:19,867 INFO org.apache.zookeeper.ClientCnxn: Closing
> ClientCnxn for session: 0x252df4a91a0005
>  2009-11-26 01:53:19,867 INFO org.apache.zookeeper.ClientCnxn:
> Disconnecting ClientCnxn for session: 0x252df4a91a0005
>  2009-11-26 01:53:19,868 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>  2009-11-26 01:53:19,868 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x252df4a91a0005 closed
>  2009-11-26 01:53:19,902 INFO org.apache.hadoop.hbase.Leases:
> regionserver/192.168.1.111:60020.leaseChecker closing leases
>  2009-11-26 01:53:19,903 INFO org.apache.hadoop.hbase.Leases:
> regionserver/192.168.1.111:60020.leaseChecker closed leases
>  2009-11-26 01:53:19,969 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
> 192.168.1.111:60020 exiting
>  2009-11-26 01:53:19,992 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
> thread.
>  2009-11-26 01:53:19,992 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
>
> Joost
>

Mime
View raw message