Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C146F9FC2 for ; Sat, 18 Feb 2012 02:42:26 +0000 (UTC) Received: (qmail 64378 invoked by uid 500); 18 Feb 2012 02:42:26 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 64332 invoked by uid 500); 18 Feb 2012 02:42:26 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 64310 invoked by uid 99); 18 Feb 2012 02:42:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Feb 2012 02:42:26 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Feb 2012 02:42:23 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 122D11BDA47 for ; Sat, 18 Feb 2012 02:42:02 +0000 (UTC) Date: Sat, 18 Feb 2012 02:42:02 +0000 (UTC) From: "chunhui shen (Commented) (JIRA)" To: issues@hbase.apache.org Message-ID: <514250602.53312.1329532922075.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <683625020.71992.1327418318532.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210772#comment-13210772 ] chunhui shen commented on HBASE-5270: ------------------------------------- {code} + // We set serverLoad with one region, it could differentiate with + // regionserver which is started just now + HServerLoad serverLoad = new HServerLoad(); + serverLoad.setNumberOfRegions(1); How you know it has a region? {code} We do this to mark the RS running ago, not the regionserver which is started just now. (If it is a regionserver started just now, it has no regions, so when master assignRootAndMeta,we needn't expire it.(Only 90 version need do this, because rootLocation doesn't contain startcode, so we can't be sure it is a rootServer according to HServerAddress)) > Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler > ----------------------------------------------------------------------------------------------------- > > Key: HBASE-5270 > URL: https://issues.apache.org/jira/browse/HBASE-5270 > Project: HBase > Issue Type: Sub-task > Components: master > Reporter: Zhihong Yu > Fix For: 0.94.0, 0.92.1 > > Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, sampletest.txt > > > This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: > Reviewing 0.92v17 > isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. > Does isDeadRootServerInProgress need to be public? Ditto for meta version. > This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? > Is there anything in place to stop us expiring a server twice if its carrying root and meta? > What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. > I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. > It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? > Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? > Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? > This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? > This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira