hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: One problem with LoadBalancer
Date Fri, 24 Jun 2011 18:46:24 GMT
I feel like I'm missing too much information to be helpful, for
example when the standby master comes up it needs to 13134 RIT. What
happened there? I thought the regions were all assigned? What happened
to the first master? How come 1306205940117 whent from 5841 regions to
0?

Thx for filling the gaps,

J-D

On Thu, Jun 23, 2011 at 6:35 PM, bijieshan <bijieshan@huawei.com> wrote:
> Hi,
>
>  I found the problem while the cluster couldn't balance. One node's regions count is
the double of the other nodes. And it didn't move regions anymore:
>   Address Start Code Load
> 158-1-101-202:20030 1306205409671 requests=0, regions=2593, usedHeap=114, maxHeap=8165
> 158-1-101-222:20030 1306205940117 requests=0, regions=5841, usedHeap=80, maxHeap=8165
> 158-1-101-52:20030 1306205417261 requests=0, regions=2622, usedHeap=76, maxHeap=8165
> 158-1-101-82:20030 1306205415714 requests=0, regions=2633, usedHeap=69, maxHeap=8165
> Total:  servers: 4   requests=0, regions=13689
>
>
>  HBASE-3985-"Same Region could be picked out twice in LoadBalancer" was found by my
analysis on this problem.
>  But I'm afraid it's not the main cause of the problem.
>
>  There's one active master, one standby master, four regionservers in our cluster.
>
>>>10:57:41, the standby hamster 222 becomes the active one.
> 2011-05-24 10:57:41,314 INFO org.apache.hadoop.hbase.master.HMaster: Master startup proceeding:
master failover
>
>>>4 regionservers was registered in 222 one by one. Only one regionserver seemed
some time late.
> 2011-05-24 10:57:37,533 INFO : Registering server=158-1-101-82,20020,1306205415714, regionCount=3388,
userLoad=true
> 2011-05-24 10:57:37,537 INFO : Registering server=158-1-101-202,20020,1306205409671,
regionCount=3453, userLoad=true
> 2011-05-24 10:57:37,598 INFO : Registering server=158-1-101-52,20020,1306205417261, regionCount=3411,
userLoad=true
> 2011-05-24 10:59:00,408 INFO : Registering server=158-1-101-222,20020,1306205940117,
regionCount=0, userLoad=false
>
>>>13134 regions needed to move after rebuildUserRegions(13689 regions in the cluster
during the time).
> 2011-05-24 10:58:47,534 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over
master needs to process 13134 regions in transition
>
>>>All the 13134 regions were opened, regions opened count in each server:
> 158-1-101-222,20020,1306205940117    Count: 834
> 158-1-101-82,20020,1306205415714    Count: 4093
> 158-1-101-202,20020,1306205409671    Count: 4118
> 158-1-101-52,20020,1306205417261    Count: 4089
>
>>>The nearest balancer calculate results:
> 2011-05-24 11:12:11,076 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated
a load balance in 19ms. Moving 5012 regions off of 3 overloaded servers onto 1 less loaded
servers
>
> "5012" is an unimaginable number here, for it is larger than the average number "3424.5"
>
>
> Jieshan Bean
>
>
>

Mime
View raw message