hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: max regionserver handler count
Date Mon, 29 Apr 2013 09:25:13 GMT
I noticed the 8 occurrences of 0x703e... following region server name in the abort message.

I wonder why the repetition ?

Cheers  

On Apr 29, 2013, at 2:17 AM, Viral Bajaria <viral.bajaria@gmail.com> wrote:

> On Sun, Apr 28, 2013 at 7:37 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
> 
>> So you mean that when the handler count is more than 5k this happens when
>> it is lesser this does not.  Have you repeated this behaviour?
> 
>> What i doubt is when you say bouncing around different states i feel may be
>> the ROOT assignment was a problem and something messed up there.
>> If the reason was due to handler count then that needs different analysis.
>> 
>> I think that if you can repeat the experiment and get the same behaviour,
>> you can share the logs so that we can ascertain the exact problem.
> 
> Yeah I have repeated the behavior. But it seems the issue is due to some
> weird pauses in the region server whenever I bump up the region handler
> count (logs are below). I doubt the issue is GC, since it should not take
> such a long time because this is happening on startup with 48GB heap size.
> There are no active clients either.
> 
> I can safely say this is due to bumping up the region handler count is due
> to the fact that I started 3 regionservers with 5000 handlers and 3 with
> 15000 handlers. The one's with 15000 spun up all IPC handlers and then
> errored out as show in the logs below. This is just the logs around the
> error. Before the error there were a few more timeouts.
> 
> I checked zookeeper servers (I have a 3-node cluster) and it did not GC
> around the same time nor was it under any kind of load.
> 
> Thanks,
> Viral
> 
> Region Server Logs
> 2013-04-29 08:00:55,512 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=98.34 MB,
> free=11.61 GB, max=11.71 GB, blocks=0, accesses=0, hits=0, hitRatio=0,
> cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0,
> evicted=0, evictedPerRun=NaN
> 2013-04-29 08:02:35,674 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 40592ms for sessionid
> 0x703e48a8cfd81be6, closing socket connection and attempting reconnect
> 2013-04-29 08:02:36,286 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server 10.152.152.84:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-04-29 08:02:36,287 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to 10.152.152.84:2181, initiating session
> 2013-04-29 08:02:36,288 INFO org.apache.zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x703e48a8cfd81be6 has expired,
> closing socket connection
> 2013-04-29 08:03:16,287 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> <hostname>,60020,1367221255417:
> regionserver:60020-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6
> regionserver:60020-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6
> received expired from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>        at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:389)
>        at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
>        at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>        at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-04-29 08:03:16,288 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> <hostname>,60020,1367221255417: Unhandled exception:
> org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> currently processing <hostname>,60020,1367221255417 as dead server
> org.apache.hadoop.hbase.YouAreDeadException:
> org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> currently processing <hostname>,60020,1367221255417 as dead server
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>        at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>        at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
>        at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880)
>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:748)
>        at java.lang.Thread.run(Thread.java:662)

Mime
View raw message