Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 150E31751C for ; Fri, 27 Mar 2015 05:22:55 +0000 (UTC) Received: (qmail 65981 invoked by uid 500); 27 Mar 2015 05:22:54 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 65940 invoked by uid 500); 27 Mar 2015 05:22:54 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 65928 invoked by uid 99); 27 Mar 2015 05:22:54 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Mar 2015 05:22:54 +0000 Date: Fri, 27 Mar 2015 05:22:54 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383352#comment-14383352 ] Lars Hofhansl commented on HBASE-13337: --------------------------------------- Right. But when all region servers are gone, HMaster cannot assign any regions. As soon as the first few region servers are seen by the master again, it'll assign all the regions to them. In some case that might be what you want, in others it's not. Unless we tell the master it can't know. We can add some guessing of course. Maybe when more than 1/2 of the region server are suddenly gone, wait a bit before reassigning the regions. But I will bet that any guessing we do will have somebody complain. Or we can invent a new API to tell the master to hold off assigning regions. In the end the the solution is simple: If you plan to bring down many region servers, stop the HMaster first. > Table regions are not assigning back, after restarting all regionservers at once. > --------------------------------------------------------------------------------- > > Key: HBASE-13337 > URL: https://issues.apache.org/jira/browse/HBASE-13337 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Affects Versions: 2.0.0 > Reporter: Y. SREENIVASULU REDDY > Assignee: Ashish Singhi > Priority: Blocker > Fix For: 2.0.0 > > > Regions of the table are continouly in state=FAILED_CLOSE. > {noformat} > Region State RIT time (ms) > 8f62e819b356736053e06240f7f7c6fd t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM1,16040,1427362531818 113929 > caf59209ae65ea80fca6bdc6996a7d68 t1,dddddddd,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM2,16040,1427362533691 113929 > db52a74988f71e5cf257bbabf31f26f3 t1,44444444,1427362431330.db52a74988f71e5cf257bbabf31f26f3. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM3,16040,1427362533691 113920 > 43f3a65b9f9ff283f598c5450feab1f8 t1,88888888,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM1,16040,1427362531818 113920 > {noformat} > *Steps to reproduce:* > 1. Start HBase cluster with more than one regionserver. > 2. Create a table with precreated regions. (lets say 15 regions) > 3. Make sure the regions are well balanced. > 4. Restart all the Regionservers process at once across the cluster, except HMaster process > 5. After restarting the Regionservers, successfully will connect to the HMaster. > *Bug:* > But no regions are assigning back to the Regionservers. > *Master log shows as follows:* > {noformat} > 2015-03-26 15:05:36,201 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} > 2015-03-26 15:05:36,202 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore: Updating row t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=PENDING_OPEN&sn=VM1,16040,1427362531818 > 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Force region state offline {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} > 2015-03-26 15:05:36,244 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, server=VM1,16040,1427362531818} > 2015-03-26 15:05:36,244 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore: Updating row t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=PENDING_CLOSE > 2015-03-26 15:05:36,248 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10 > 2015-03-26 15:05:36,248 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10 > 2015-03-26 15:05:36,249 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10 > 2015-03-26 15:05:36,249 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=4 of 10 > 2015-03-26 15:05:36,249 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=5 of 10 > 2015-03-26 15:05:36,250 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=6 of 10 > 2015-03-26 15:05:36,250 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=7 of 10 > 2015-03-26 15:05:36,250 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=8 of 10 > 2015-03-26 15:05:36,251 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=9 of 10 > 2015-03-26 15:05:36,251 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=10 of 10 > 2015-03-26 15:05:36,251 WARN [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates: Failed to open/close 8f62e819b356736053e06240f7f7c6fd on VM1,16040,1427362531818, set to FAILED_CLOSE > 2015-03-26 15:05:36,251 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, server=VM1,16040,1427362531818} to {8f62e819b356736053e06240f7f7c6fd state=FAILED_CLOSE, ts=1427362536251, server=VM1,16040,1427362531818} > 2015-03-26 15:05:36,251 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore: Updating row t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=FAILED_CLOSE > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)