hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samir Ahmic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.
Date Mon, 25 May 2015 13:53:17 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558272#comment-14558272
] 

Samir Ahmic commented on HBASE-13337:
-------------------------------------

I was able to find root cause of this issue it is in ServerManager#getRsAdmin(), after regionserver
 is restarted  {code} ClusterConnection connection {code} becomes staled and need to be recreated
in order to master establish connection with restarted regionserver. Here is diff of what
i did in order to fix this issue: 
{code}
diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
index 1ed2514..a05df3b 100644
--- a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
+++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
@@ -930,8 +930,10 @@ public class ServerManager {
         // A master is also a region server now, see HBASE-10569 for details
         admin = ((HRegionServer)master).getRSRpcServices();
       } else {
-        admin = this.connection.getAdmin(sn);
-      }
+        Configuration conf = master.getConfiguration();
+        ClusterConnection connection = (ClusterConnection) ConnectionFactory.createConnection(conf);
+        admin = connection.getAdmin(sn);
+      }      
       this.rsAdmins.put(sn, admin);
     }
     return admin;
{code}

Should i create patch or there is some better way to resolve this issue ? 


> Table regions are not assigning back, after restarting all regionservers at once.
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-13337
>                 URL: https://issues.apache.org/jira/browse/HBASE-13337
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 2.0.0
>            Reporter: Y. SREENIVASULU REDDY
>            Priority: Blocker
>             Fix For: 2.0.0
>
>         Attachments: HBASE-13337.patch
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> Region					State																				RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd	t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd.
state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM1,16040,1427362531818
113929
> caf59209ae65ea80fca6bdc6996a7d68	t1,dddddddd,1427362431330.caf59209ae65ea80fca6bdc6996a7d68.
state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM2,16040,1427362533691
113929
> db52a74988f71e5cf257bbabf31f26f3	t1,44444444,1427362431330.db52a74988f71e5cf257bbabf31f26f3.
state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM3,16040,1427362533691
113920
> 43f3a65b9f9ff283f598c5450feab1f8	t1,88888888,1427362431330.43f3a65b9f9ff283f598c5450feab1f8.
state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM1,16040,1427362531818
113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except HMaster process
> 5. After restarting the Regionservers, successfully will connect to the HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates:
Transition {8f62e819b356736053e06240f7f7c6fd state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602}
to {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore:
Updating row t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager:
Force region state offline {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201,
server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates:
Transition {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818}
to {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore:
Updating row t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager:
Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd.,
try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager:
Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd.,
try=2 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager:
Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd.,
try=3 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager:
Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd.,
try=4 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager:
Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd.,
try=5 of 10
> 2015-03-26 15:05:36,250 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager:
Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd.,
try=6 of 10
> 2015-03-26 15:05:36,250 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager:
Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd.,
try=7 of 10
> 2015-03-26 15:05:36,250 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager:
Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd.,
try=8 of 10
> 2015-03-26 15:05:36,251 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager:
Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd.,
try=9 of 10
> 2015-03-26 15:05:36,251 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager:
Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd.,
try=10 of 10
> 2015-03-26 15:05:36,251 WARN  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates:
Failed to open/close 8f62e819b356736053e06240f7f7c6fd on VM1,16040,1427362531818, set to FAILED_CLOSE
> 2015-03-26 15:05:36,251 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates:
Transition {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, server=VM1,16040,1427362531818}
to {8f62e819b356736053e06240f7f7c6fd state=FAILED_CLOSE, ts=1427362536251, server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,251 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore:
Updating row t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=FAILED_CLOSE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message