hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13605) RegionStates should not keep its list of dead servers
Date Thu, 18 Jun 2015 01:11:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591074#comment-14591074
] 

Hadoop QA commented on HBASE-13605:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12740199/hbase-13605_v4-master.patch
  against master branch at commit 623fd63827b2953c150597f24c7205737119bebe.
  ATTACHMENT ID: 12740199

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 3 new or modified
tests.

    {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions
(2.4.1 2.5.2 2.6.0)

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the total number of
protoc compiler warnings.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 warning messages.

                {color:red}-1 checkstyle{color}.  The applied patch generated 1910 checkstyle
errors (more than the master's current 1907 errors).

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines longer than 100

    {color:red}-1 site{color}.  The patch appears to cause mvn site goal to fail.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.TestRegionRebalancing

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14450//testReport/
Release Findbugs (version 2.0.3) 	warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14450//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14450//artifact/patchprocess/checkstyle-aggregate.html

                Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14450//artifact/patchprocess/patchJavadocWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14450//console

This message is automatically generated.

> RegionStates should not keep its list of dead servers
> -----------------------------------------------------
>
>                 Key: HBASE-13605
>                 URL: https://issues.apache.org/jira/browse/HBASE-13605
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>            Priority: Critical
>             Fix For: 2.0.0, 1.0.2, 1.1.1
>
>         Attachments: hbase-13605_v1.patch, hbase-13605_v3-branch-1.1.patch, hbase-13605_v4-branch-1.1.patch,
hbase-13605_v4-master.patch
>
>
> As mentioned in https://issues.apache.org/jira/browse/HBASE-9514?focusedCommentId=13769761&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13769761
and HBASE-12844 we should have only 1 source of cluster membership. 
> The list of dead server and RegionStates doing it's own liveliness check (ServerManager.isServerReachable())
has caused an assignment problem again in a test cluster where the region states "thinks"
that the server is dead and SSH will handle the region assignment. However the RS is not dead
at all, living happily, and never gets zk expiry or YouAreDeadException or anything. This
leaves the list of regions unassigned in OFFLINE state. 
> master assigning the region:
> {code}
> 15-04-20 09:02:25,780 DEBUG [AM.ZK.Worker-pool3-t330] master.RegionStates: Onlined 77dddcd50c22e56bfff133c0e1f9165b
on os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 {ENCODED => 77dddcd50c
> {code}
> Master then disabled the table, and unassigned the region:
> {code}
> 2015-04-20 09:02:27,158 WARN  [ProcedureExecutorThread-1] zookeeper.ZKTableStateManager:
Moving table loadtest_d1 state from DISABLING to DISABLING
>  Starting unassign of loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b. (offlining),
current state: {77dddcd50c22e56bfff133c0e1f9165b state=OPEN, ts=1429520545780,   server=os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268}
> bleProcedure$BulkDisabler-0] master.AssignmentManager: Sent CLOSE to os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
for region loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b.
> 2015-04-20 09:02:27,414 INFO  [AM.ZK.Worker-pool3-t316] master.RegionStates: Offlined
77dddcd50c22e56bfff133c0e1f9165b from os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> {code}
> On table re-enable, AM does not assign the region: 
> {code}
> 2015-04-20 09:02:30,415 INFO  [ProcedureExecutorThread-3] balancer.BaseLoadBalancer:
Reassigned 25 regions. 25 retained the pre-restart assignment.ยท
> 2015-04-20 09:02:30,415 INFO  [ProcedureExecutorThread-3] procedure.EnableTableProcedure:
Bulk assigning 25 region(s) across 5 server(s), retainAssignment=true
> l,16000,1429515659726-GeneralBulkAssigner-4] master.RegionStates: Couldn't reach online
server os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> l,16000,1429515659726-GeneralBulkAssigner-4] master.AssignmentManager: Updating the state
to OFFLINE to allow to be reassigned by SSH
> nmentManager: Skip assigning loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b.,
it is on a dead but not processed yet server: os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message