hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: duplicate regionserver entries
Date Mon, 01 Mar 2010 19:37:22 GMT



> Date: Mon, 1 Mar 2010 10:46:59 -0800
> Subject: duplicate regionserver entries
> From: yuzhihong@gmail.com
> To: hbase-user@hadoop.apache.org
> 
> Hi,
> We use hbase 0.20.1
> On http://snv-it-lin-006:60010/master.jsp, I see two rows for the same
> region server:
> snv-it-lin-010.projectrialto.com:600301267038448430requests=0, regions=25,
> usedHeap=1280, maxHeap=6127 snv-it-lin-010.projectrialto.com:60030
> 1267466540070requests=0, regions=2, usedHeap=1278, maxHeap=6127
> But in regionservers on master server, snv-it-lin-010 is only specified
> once:

> Has anyone seen similar thing before ?


Funny you should mention this.
I was about to post something on the same topic...

What do see when you run a status 'simple' in an HBase Shell?

On our Dev Cloud I see the following:

hbase(main):003:0> status 'simple'
6 live servers
    dchilcmsdn03[redacted]com:60020 1267351339661
        requests=0, regions=0, usedHeap=48, maxHeap=1777
    dchilcmsdn01[redacted]com:60020 1267216506258
        requests=0, regions=2, usedHeap=96, maxHeap=1777
    dchilcmsdn02[redacted]com:60020 1267466817617
        requests=0, regions=0, usedHeap=26, maxHeap=1777
    dchilcmsdn01[redacted]com:60020 1267351329701
        requests=0, regions=0, usedHeap=71, maxHeap=1777
    dchilcmsdn03.[redacted]com:60020 1267216506597
        requests=0, regions=1, usedHeap=43, maxHeap=1777
    dchilcmsdn02[redacted]com:60020 1267216506428
        requests=0, regions=2, usedHeap=82, maxHeap=1777
0 dead servers
hbase(main):004:0>

Here I have 3 servers and one master. Over the weekend with no real users, it looks like the
three servers had to restart themselves.

When I try to run the command list in Hbase shell I get the following:

hbase(main):002:0> list
NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server null for region , row '', but failed after 7 attempts.
Exceptions:
org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException:
-ROOT-,,0


We saw the same problems on our sandbox environment, however we were using VMWare to split
a bunch of 8 core blades in to two virtual servers with 3 cores each. (giving us 10 nodes
instead of 5).  Since we're seeing the same type of problem, we can now rule out VMWare as
a possible culprit. 

I saw some of the posts by St. Ack and others in the mail archives, and I think that what
we may be experiencing are issues due to high loads of network traffic that occur periodically.
Just my guess since these issues happen at a time when there are no loads on the system.

So I have to wonder how network latency plays a factor? I mean normally we'll see sub millisecond
response times but then we can also see bursts of network latency over 100-200ms or longer.

Is there something I can tune to account for these? Or am I missing something?

Thx

-Mike


 		 	   		  
_________________________________________________________________
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/201469228/direct/01/
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message