hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cyril Scetbon <cyril.scet...@free.fr>
Subject Re: hosts unreachables
Date Fri, 01 Jun 2012 14:40:22 GMT
I've another regionserver (hb-d2) that crashed (I can easily reproduce 
the issue by continuing injections), and as I see in master log, it gets 
information about hb-d2 every 5 minutes. I suppose it's what helps him 
to note if a node is dead or not. However it adds hb-d2 to the dead node 
list at 13:32:20, so before 5 minutes since the last time it got the 
server information. Is it normal ?

2012-06-01 13:02:36,309 DEBUG 
org.apache.hadoop.hbase.master.LoadBalancer: Server information: 
hb-d5,60020,1338553124247=47, hb-d4,60020,1338553126577=47, 
hb-d7,60020,1338553124279=46, hb-d10,60020,1338553126695=47, hb-d6,60020,133
8553124588=47, hb-d8,60020,1338553124113=47, 
hb-d2,60020,1338553126560=47, hb-d11,60020,1338553124329=47, 
hb-d12,60020,1338553126567=47, hb-d1,60020,1338553126474=47, 
hb-d9,60020,1338553124179=47
..
2012-06-01 13:07:36,319 DEBUG 
org.apache.hadoop.hbase.master.LoadBalancer: Server information: 
hb-d5,60020,1338553124247=47, hb-d4,60020,1338553126577=47, 
hb-d7,60020,1338553124279=46, hb-d10,60020,1338553126695=47, hb-d6,60020,133
8553124588=47, hb-d8,60020,1338553124113=47, 
hb-d2,60020,1338553126560=47, hb-d11,60020,1338553124329=47, 
hb-d12,60020,1338553126567=47, hb-d1,60020,1338553126474=47, 
hb-d9,60020,1338553124179=47
..
2012-06-01 13:12:36,328 DEBUG 
org.apache.hadoop.hbase.master.LoadBalancer: Server information: 
hb-d5,60020,1338553124247=47, hb-d4,60020,1338553126577=47, 
hb-d7,60020,1338553124279=46, hb-d10,60020,1338553126695=47, hb-d6,60020,133
8553124588=47, hb-d8,60020,1338553124113=47, 
hb-d2,60020,1338553126560=47, hb-d11,60020,1338553124329=47, 
hb-d12,60020,1338553126567=47, hb-d1,60020,1338553126474=47, 
hb-d9,60020,1338553124179=47
..
2012-06-01 13:17:36,337 DEBUG 
org.apache.hadoop.hbase.master.LoadBalancer: Server information: 
hb-d5,60020,1338553124247=47, hb-d4,60020,1338553126577=47, 
hb-d7,60020,1338553124279=46, hb-d10,60020,1338553126695=47, hb-d6,60020,133
8553124588=47, hb-d8,60020,1338553124113=47, 
hb-d2,60020,1338553126560=47, hb-d11,60020,1338553124329=47, 
hb-d12,60020,1338553126567=47, hb-d1,60020,1338553126474=47, 
hb-d9,60020,1338553124179=47
..
2012-06-01 13:22:36,346 DEBUG 
org.apache.hadoop.hbase.master.LoadBalancer: Server information: 
hb-d5,60020,1338553124247=47, hb-d4,60020,1338553126577=47, 
hb-d7,60020,1338553124279=46, hb-d10,60020,1338553126695=47, hb-d6,60020,133
8553124588=47, hb-d8,60020,1338553124113=47, 
hb-d2,60020,1338553126560=47, hb-d11,60020,1338553124329=47, 
hb-d12,60020,1338553126567=47, hb-d1,60020,1338553126474=47, 
hb-d9,60020,1338553124179=47
..
2012-06-01 13:27:36,353 DEBUG 
org.apache.hadoop.hbase.master.LoadBalancer: Server information: 
hb-d5,60020,1338553124247=47, hb-d4,60020,1338553126577=47, 
hb-d7,60020,1338553124279=46, hb-d10,60020,1338553126695=47, hb-d6,60020,133
8553124588=47, hb-d8,60020,1338553124113=47, 
hb-d2,60020,1338553126560=47, hb-d11,60020,1338553124329=47, 
hb-d12,60020,1338553126567=47, hb-d1,60020,1338553126474=47, 
hb-d9,60020,1338553124179=47
..
2012-06-01 13:32:20,048 INFO 
org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer 
ephemeral node deleted, processing expiration [hb-d2,60020,1338553126560]
2012-06-01 13:32:20,048 DEBUG 
org.apache.hadoop.hbase.master.ServerManager: 
Added=hb-d2,60020,1338553126560 to dead servers, submitted shutdown 
handler to be executed, root=false, meta=false
2012-06-01 13:32:20,048 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting 
logs for hb-d2,60020,1338553126560


On 6/1/12 3:25 PM, Cyril Scetbon wrote:
> I've added hbase.hregion.memstore.mslab.enabled = true to the 
> configuration of all regionservers and add flags -XX:+UseParNewGC 
> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
> -XX:CMSInitiatingOccupancyFraction=60 to the hbase environment
> However my regionservers are still crashing when I load data into the 
> cluster
>
> Here are the logs for the node hb-d3 that crashed at 12:56
>
> - GC logs : http://pastebin.com/T0d0y8pZ
> - regionserver logs : http://pastebin.com/n6v9x3XM
>
> thanks
>
> On 5/31/12 11:12 PM, Jean-Daniel Cryans wrote:
>> Both, also you could bigger log snippets (post them on something like
>> pastebin.com) and we could see more evidence of the issue.
>>
>> J-D
>>
>> On Thu, May 31, 2012 at 2:09 PM, Cyril 
>> Scetbon<cyril.scetbon@free.fr>  wrote:
>>> On 5/31/12 11:00 PM, Jean-Daniel Cryans wrote:
>>>> What I'm seeing looks more like GC issues. Start reading this:
>>>> http://hbase.apache.org/book.html#gc
>>>>
>>>> J-D
>>> Hi,
>>>
>>> Really not sure cause I've enabled gcc's verbose option and I don't see
>>> anything taking a long time. Maybe I can check again on one node. On 
>>> which
>>> node do you think I should check GC issue ?
>>>
>>>
>>> -- 
>>> Cyril SCETBON
>>>
>
>


-- 
Cyril SCETBON


Mime
View raw message