hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice
Date Wed, 06 Jul 2011 14:27:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060603#comment-13060603
] 

jiraposter@reviews.apache.org commented on HBASE-4053:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/782/
-----------------------------------------------------------

Review request for hbase.


Summary
-------

When master fails over, we should check whether hris contains the region addToServers() is
trying to add.
But ArrayList is not the best data structure to perform search of specific HRegionInfo. Maybe
we should consider replacing it with e.g. ConcurrentSkipListSet


This addresses bug HBASE-4053.
    https://issues.apache.org/jira/browse/HBASE-4053


Diffs
-----

  /src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1142537 

Diff: https://reviews.apache.org/r/782/diff


Testing
-------

Ran test suite.


Thanks,

Ted



> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt, HBase-4053-90.patch, surefire-report.html
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster
is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions
to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait
on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over
master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter:
numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message