hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jieshan Bean (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-4031) An imbalance result calculated by LoadBalancer
Date Sat, 25 Jun 2011 00:54:47 GMT
An imbalance result calculated by LoadBalancer
----------------------------------------------

                 Key: HBASE-4031
                 URL: https://issues.apache.org/jira/browse/HBASE-4031
             Project: HBase
          Issue Type: Bug
          Components: master
    Affects Versions: 0.90.3
            Reporter: Jieshan Bean
             Fix For: 0.90.4


  I found the problem while the cluster couldn't balance(Around time of 2011-05-24 11:28).One
node's regions count is the double of the other nodes. And it didn't move regions anymore:
   Address Start Code Load
158-1-101-202:20030 1306205409671 requests=0, regions=2593, usedHeap=114, maxHeap=8165 158-1-101-222:20030
1306205940117 requests=0, regions=5841, usedHeap=80, maxHeap=8165 158-1-101-52:20030 1306205417261
requests=0, regions=2622, usedHeap=76, maxHeap=8165 158-1-101-82:20030 1306205415714 requests=0,
regions=2633, usedHeap=69, maxHeap=8165 
Total:  servers: 4   requests=0, regions=13689


  HBASE-3985-"Same Region could be picked out twice in LoadBalancer" was found by my analysis
on this problem.
  But I'm afraid it's not the main cause of the problem.

  There's one active master, one standby master, four regionservers in our cluster.

>>10:57:41, the standby hamster 222 becomes the active one.
2011-05-24 10:57:41,314 INFO org.apache.hadoop.hbase.master.HMaster: Master startup proceeding:
master failover

>>4 regionservers was registered in 222 one by one. Only one regionserver seemed some
time late.
2011-05-24 10:57:37,533 INFO : Registering server=158-1-101-82,20020,1306205415714, regionCount=3388,
userLoad=true
2011-05-24 10:57:37,537 INFO : Registering server=158-1-101-202,20020,1306205409671, regionCount=3453,
userLoad=true
2011-05-24 10:57:37,598 INFO : Registering server=158-1-101-52,20020,1306205417261, regionCount=3411,
userLoad=true
2011-05-24 10:59:00,408 INFO : Registering server=158-1-101-222,20020,1306205940117, regionCount=0,
userLoad=false

>>13134 regions needed to move after rebuildUserRegions(13689 regions in the cluster
during the time).
2011-05-24 10:58:47,534 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over
master needs to process 13134 regions in transition

>>All the 13134 regions were opened, regions opened count in each server:
158-1-101-222,20020,1306205940117    Count: 834
158-1-101-82,20020,1306205415714    Count: 4093
158-1-101-202,20020,1306205409671    Count: 4118
158-1-101-52,20020,1306205417261    Count: 4089

>>The nearest balancer calculate results:
2011-05-24 11:12:11,076 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load
balance in 19ms. Moving 5012 regions off of 3 overloaded servers onto 1 less loaded servers

"5012" is an unimaginable number here, for it is larger than the average number "3424.5"


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message