Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B9C8E96CD for ; Thu, 7 Jun 2012 15:28:24 +0000 (UTC) Received: (qmail 15889 invoked by uid 500); 7 Jun 2012 15:28:24 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 15814 invoked by uid 500); 7 Jun 2012 15:28:24 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 15437 invoked by uid 99); 7 Jun 2012 15:28:24 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jun 2012 15:28:23 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id A98ED142860 for ; Thu, 7 Jun 2012 15:28:23 +0000 (UTC) Date: Thu, 7 Jun 2012 15:28:23 +0000 (UTC) From: "ramkrishna.s.vasudevan (JIRA)" To: issues@hbase.apache.org Message-ID: <352928233.48173.1339082903696.JavaMail.jiratomcat@issues-vm> In-Reply-To: <1618023484.26507.1338556357742.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (HBASE-6147) SSH and AM.joinCluster leads to region assignment inconsistency in many cases. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291073#comment-13291073 ] ramkrishna.s.vasudevan commented on HBASE-6147: ----------------------------------------------- This patch does not update the comments. But just to show the changes that we need to make so that this problem is solved. But for this to happen HBASE-6060 should go in and for trunk HBASe-6012 should go in. How HBASe-6060 helps and how Chunhui's suggestion of waiting for master initialization helps is explained below -> Now all the assignments that happen during which if any RS goes down things will be handled by HBASE-6060. -> Taking the case of join cluster and SSH Following scenarios to be considered 1> Clean cluster start up 2> Partially clean start up In the case of clean cluster start up, we do bulk assign. Now while doing this if any RS goes down, as per Chunhui's suggestion we will wait for the master to initialize. Now by this time the region plan would be populated considering the dead server by bulk assign. So when the master completes initialization, the SSH will see that few regions are there in regionplan with the dead server and so the new logic introduced in HBASE-6060 will go ahead with assignment. no waiting needed. For the 2nd case, if by the time the ProcessRIT decides to process the node the server would be dead, so may be previously {code} addToRITandCallClose(regionInfo, RegionState.State.OFFLINE, rt); break; } regionsInTransition.put(encodedRegionName, getRegionState(regionInfo, RegionState.State.OPENING, rt)); failoverProcessedRegions.put(encodedRegionName, regionInfo); {code} we were just populating to OPENING in the RIT map. But there would be no one to process this. Now as per the latest patch we just add a region plan. Now even if the server goes down and SSH tries to process he will see the regionplan(with HBASE-6060 and Chunhui's suggestion) and immediately trigger assignment. We found that even for 'RS_ZK_REGION_OPENED' this may be needed. We will also do a cluster testing. Please review and provide your comments. Hope with these changes we need not depend on timeout monitor. @Chunhui Please provide your thoughts on this. It would be nice if you can also test these patches HBASE-6147, HBASE-6060 and HBASE-6012 together. > SSH and AM.joinCluster leads to region assignment inconsistency in many cases. > ------------------------------------------------------------------------------ > > Key: HBASE-6147 > URL: https://issues.apache.org/jira/browse/HBASE-6147 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.1, 0.94.0 > Reporter: ramkrishna.s.vasudevan > Fix For: 0.92.3 > > Attachments: HBASE-6147.patch, HBASE-6147_trunk.patch > > > We are facing few issues in the master restart and SSH going in parallel. > Chunhui also suggested that we need to rework on this part. This JIRA is aimed at solving all such possibilities of region assignment inconsistency -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira