hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chunhui shen (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4899) Region would be assigned twice easily with continually killing server and moving region in testing environment
Date Thu, 01 Dec 2011 06:23:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160677#comment-13160677
] 

chunhui shen commented on HBASE-4899:
-------------------------------------

Testing result on my QA environment
{code}
Results :

Tests run: 1175, Failures: 0, Errors: 0, Skipped: 9

[INFO] 
[INFO] --- maven-surefire-plugin:2.11-TRUNK-HBASE-2:test (secondPartTestsExecution) @ hbase
---
[INFO] Tests are skipped.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:44:10.984s
[INFO] Finished at: Thu Dec 01 14:10:34 CST 2011
[INFO] Final Memory: 35M/380M
[INFO] ------------------------------------------------------------------------
{code}

please check!
                
> Region would be assigned twice easily with continually  killing server and moving region
in testing environment
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4899
>                 URL: https://issues.apache.org/jira/browse/HBASE-4899
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>         Attachments: hbase-4899.patch, hbase-4899v2.patch, hbase-4899v3.patch
>
>
> Before assigning region in ServerShutdownHandler#process, it will check whether region
is in RIT,
> however, this checking doesn't work as the excepted in the following case:
> 1.move region A from server B to server C
> 2.kill server B
> 3.start server B immediately
> Let's see what happen in the code for the above case
> {code}
> for step1:
> 1.1 server B close the region A,
> 1.2 master setOffline for region A,(AssignmentManager#setOffline:this.regions.remove(regionInfo))
> 1.3 server C start to open region A.(Not completed)
> for step3:
> master ServerShutdownHandler#process() for server B
> {
> ..
> splitlog()
> ...
> List<RegionState> regionsInTransition =
>         this.services.getAssignmentManager()
>         .processServerShutdown(this.serverName);
> ...
> Skip regions that were in transition unless CLOSING or PENDING_CLOSE
> ...
> assign region
> }
> {code}
> In fact, when running ServerShutdownHandler#process()#this.services.getAssignmentManager().processServerShutdown(this.serverName),
region A is in RIT (step1.3 not completed), but the return List<RegionState> regionsInTransition
doesn't contain it, because region A has removed from AssignmentManager.regions by AssignmentManager#setOffline
in step 1.2
> Therefore, region A will be assigned twice.
> Actually, one server killed and started twice will also easily cause region assigned
twice.
> Exclude the above reason, another probability : 
> when execute ServerShutdownHandler#process()#MetaReader.getServerUserRegions ,region
is included which is in RIT now.
> But after completing MetaReader.getServerUserRegions, the region has been opened in other
server and is not in RIT now.
> In our testing environment where balancing,moving and killing are executed periodly,
assigning region twice often happens, and it is hateful because it will affect other test
cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message