Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9E94B46C4 for ; Wed, 8 Jun 2011 02:46:24 +0000 (UTC) Received: (qmail 27163 invoked by uid 500); 8 Jun 2011 02:46:23 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 27138 invoked by uid 500); 8 Jun 2011 02:46:23 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 27129 invoked by uid 99); 8 Jun 2011 02:46:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jun 2011 02:46:22 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jun 2011 02:46:19 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C30AA1067FF for ; Wed, 8 Jun 2011 02:45:58 +0000 (UTC) Date: Wed, 8 Jun 2011 02:45:58 +0000 (UTC) From: "Jieshan Bean (JIRA)" To: issues@hbase.apache.org Message-ID: <1967334207.2115.1307501158795.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <640850820.61502.1306976207558.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HBASE-3946) The splitted region can be online again while the standby hmaster becomes the active one MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-3946: -------------------------------- Attachment: HBASE-3946-V2.patch Sorry for the prev patch name is wrong. This patch just modified two places of the code format. > The splitted region can be online again while the standby hmaster becomes the active one > ---------------------------------------------------------------------------------------- > > Key: HBASE-3946 > URL: https://issues.apache.org/jira/browse/HBASE-3946 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.3 > Reporter: Jieshan Bean > Assignee: Jieshan Bean > Fix For: 0.90.4 > > Attachments: HBASE-3926-V2.patch, HBASE-3946-V2.patch, HBASE-3946.patch > > > (The cluster has two HMatser, one active and one standby) > 1.While the active HMaster shutdown, the standby one would become the active one, and went into the processFailover() method: > if (regionCount == 0) { > LOG.info("Master startup proceeding: cluster startup"); > this.assignmentManager.cleanoutUnassigned(); > this.assignmentManager.assignAllUserRegions(); > } else { > > LOG.info("Master startup proceeding: master failover"); > this.assignmentManager.processFailover(); > } > 2.After that, the user regions would be rebuild. > Map>> deadServers = rebuildUserRegions(); > 3.Here's how the rebuildUserRegions worked. All the regions(contain the splitted regions) would be added to the offlineRegions of offlineServers. > for (Result result : results) { > Pair region = > MetaReader.metaRowToRegionPairWithInfo(result); > if (region == null) continue; > HServerInfo regionLocation = region.getSecond(); > HRegionInfo regionInfo = region.getFirst(); > if (regionLocation == null) { > // Region not being served, add to region map with no assignment > // If this needs to be assigned out, it will also be in ZK as RIT > this.regions.put(regionInfo, null); > } else if (!serverManager.isServerOnline( > regionLocation.getServerName())) { > // Region is located on a server that isn't online > List> offlineRegions = > offlineServers.get(regionLocation); > if (offlineRegions == null) { > offlineRegions = new ArrayList>(1); > offlineServers.put(regionLocation, offlineRegions); > } > offlineRegions.add(new Pair(regionInfo, result)); > } else { > // Region is being served and on an active server > regions.put(regionInfo, regionLocation); > addToServers(regionLocation, regionInfo); > } > } > 4.It seems that all the offline regions will be added to RIT and online again: > ZKAssign will creat node for each offline never consider the splitted ones. > AssignmentManager# processDeadServers > private void processDeadServers( > Map>> deadServers) > throws IOException, KeeperException { > for (Map.Entry>> deadServer : > deadServers.entrySet()) { > List> regions = deadServer.getValue(); > for (Pair region : regions) { > HRegionInfo regionInfo = region.getFirst(); > Result result = region.getSecond(); > // If region was in transition (was in zk) force it offline for reassign > try { > ZKAssign.createOrForceNodeOffline(watcher, regionInfo, > master.getServerName()); > } catch (KeeperException.NoNodeException nne) { > // This is fine > } > // Process with existing RS shutdown code > ServerShutdownHandler.processDeadRegion(regionInfo, result, this, > this.catalogTracker); > } > } > } > AssignmentManager# processFailover > // Process list of dead servers > processDeadServers(deadServers); > // Check existing regions in transition > List nodes = ZKUtil.listChildrenAndWatchForNewChildren(watcher, > watcher.assignmentZNode); > if (nodes.isEmpty()) { > LOG.info("No regions in transition in ZK to process on failover"); > return; > } > LOG.info("Failed-over master needs to process " + nodes.size() + > " regions in transition"); > for (String encodedRegionName: nodes) { > processRegionInTransition(encodedRegionName, null); > } > So I think before add the region into RIT, check it at first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira