hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: help why do my regionservers shut themselves down?
Date Tue, 23 Apr 2013 02:35:28 GMT
Kaveh:
What version of HBase are you using ?
Around 2013-04-22 16:47:56, did you observe anything else happening in your
cluster ? See below:

2013-04-22 16:47:56,830 INFO org.apache.hadoop.hbase.**regionserver.HRegion:
compaction interrupted by user:
java.io.**InterruptedIOException: Aborting compaction of store f in region
t1_webpage,com.pandora.www:**http/shaggy,1366670139658.**9f565d5
da3468c0725e590dc232abc**23. because user requested stop.
        at org.apache.hadoop.hbase.**regionserver.Store.compact(**Store.
java:998)
        at org.apache.hadoop.hbase.**regionserver.Store.compact(**Store.
java:779)
        at org.apache.hadoop.hbase.**regionserver.HRegion.**compactStores(
HRegion.java:**776)

On Mon, Apr 22, 2013 at 6:46 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Kaveh,
>
> the respons is maybe already displayed on the logs you sent ;)
>
> "This disconnect could have been caused by a network partition or a
> long-running GC pause, either way it's recommended that you verify
> your environment."
>
> Do you have GC logs? Have you tried anything to solve that?
>
> JM
>
> 2013/4/22 kaveh minooie <kaveh@plutoz.com>:
> >
> > Hi
> >
> > after a few mapreduce jobs my regionservers shut themselves down. this is
> > the latest time that this has happened:
> >
> > 2013-04-22 16:47:21,843 INFO
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> > This client just lost it's session with ZooKeeper, trying to reconnect.
> > 2013-04-22 16:47:21,843 FATAL
> > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> server
> > serverName=d1r1n17.prod.plutoz.com,60020,1366657358443, load=(requests=5
> > 392, regions=196, usedHeap=1063, maxHeap=3966):
> > regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661
> > regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 received expired
> fr
> > om ZooKeeper, aborting
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired
> >         at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:352)
> >         at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:270)
> >         at
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:523)
> >         at
> > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:499)
> > 2013-04-22 16:47:21,843 INFO
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> > Trying to reconnect to zookeeper.
> > 2013-04-22 16:47:21,844 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> > requests=1794, regions=196, stores=1561, storefiles=1585,
> > storefileIndexSize=104, memstoreSize=306, compactionQueueSize=10,
> > flushQueueSize=0, usedHeap=1073, maxHeap=3966, blockCacheSize=661986032,
> > blockCacheFree=169901776, blockCacheCount=7242,
> blockCacheHitCount=910925,
> > blockCacheMissCount=1558134, blockCacheEvictedCount=1344753,
> > blockCacheHitRatio=36, blockCacheHitCachingRatio=40
> > 2013-04-22 16:47:21,844 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED:
> > regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661
> > regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 received expired
> from
> > ZooKeeper, aborting
> > 2013-04-22 16:47:21,844 INFO org.apache.zookeeper.ClientCnxn: EventThread
> > shut down
> > 2013-04-22 16:47:21,900 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> > Too many consecutive RollWriter requests, it's a sign of the total
> number of
> > live datanodes is lower than the tolerable replicas.
> > 2013-04-22 16:47:22,341 INFO org.apache.zookeeper.ZooKeeper: Initiating
> > client connection, connectString=zk1:2181 sessionTimeout=180000
> > watcher=hconnection
> > 2013-04-22 16:47:22,357 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 1 regions
> to
> > close
> > 2013-04-22 16:47:22,394 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> > connection to server d1r2n2.prod.plutoz.com/10.0.0.66:2181. Will not
> attempt
> > to authenticate using SASL (unknown error)
> > 2013-04-22 16:47:22,395 INFO org.apache.zookeeper.ClientCnxn: Socket
> > connection established to d1r2n2.prod.plutoz.com/10.0.0.66:2181,
> initiating
> > session
> > 2013-04-22 16:47:22,397 INFO org.apache.zookeeper.ClientCnxn: Session
> > establishment complete on server d1r2n2.prod.plutoz.com/10.0.0.66:2181,
> > sessionid = 0x13dd980d2abbf93, negotiated timeout = 40000
> > 2013-04-22 16:47:22,400 INFO
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> > Reconnected successfully. This disconnect could have been caused by a
> > network partition or a long-running GC pause, either way it's recommended
> > that you verify your environment.
> > 2013-04-22 16:47:22,400 INFO org.apache.zookeeper.ClientCnxn: EventThread
> > shut down
> > 2013-04-22 16:47:56,830 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > compaction interrupted by user:
> > java.io.InterruptedIOException: Aborting compaction of store f in region
> >
> t1_webpage,com.pandora.www:http/shaggy,1366670139658.9f565d5da3468c0725e590dc232abc23.
> > because user requested stop.
> >         at
> > org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:998)
> >         at
> > org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:779)
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:776)
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:721)
> >         at
> >
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:81)
> > 2013-04-22 16:47:56,830 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > aborted compaction on region
> >
> t1_webpage,com.pandora.www:http/shaggy,1366670139658.9f565d5da3468c0725e590dc232abc23.
> > after 5mins, 58sec
> > 2013-04-22 16:47:56,830 INFO
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> > regionserver60020.compactor exiting
> > 2013-04-22 16:47:56,832 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed
> >
> t1_webpage,com.pandora.www:http/shaggy,1366670139658.9f565d5da3468c0725e590dc232abc23.
> > 2013-04-22 16:47:57,363 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> > regionserver60020.logSyncer exiting
> > 2013-04-22 16:47:57,366 INFO org.apache.hadoop.hbase.regionserver.Leases:
> > regionserver60020 closing leases
> > 2013-04-22 16:47:57,366 INFO org.apache.hadoop.hbase.regionserver.Leases:
> > regionserver60020 closed leases
> > 2013-04-22 16:47:57,366 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020
> > exiting
> > 2013-04-22 16:47:57,497 INFO
> > org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
> starting;
> > hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-15,5,main]
> > 2013-04-22 16:47:57,497 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown
> hook
> > 2013-04-22 16:47:57,497 INFO
> > org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown
> hook
> > thread.
> > 2013-04-22 16:47:57,504 INFO org.apache.hadoop.hbase.regionserver.Leases:
> > regionserver60020.leaseChecker closing leases
> > 2013-04-22 16:47:57,504 INFO org.apache.hadoop.hbase.regionserver.Leases:
> > regionserver60020.leaseChecker closed leases
> > 2013-04-22 16:47:57,598 INFO
> > org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
> finished.
> >
> > I would appreciate it very much if someone could explain to me what just
> > happened here.
> >
> > thanks,
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message