Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 79C9099E9 for ; Wed, 21 Mar 2012 17:06:02 +0000 (UTC) Received: (qmail 32496 invoked by uid 500); 21 Mar 2012 17:06:02 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 32461 invoked by uid 500); 21 Mar 2012 17:06:02 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 32453 invoked by uid 99); 21 Mar 2012 17:06:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Mar 2012 17:06:02 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Mar 2012 17:06:01 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 11B151A914B for ; Wed, 21 Mar 2012 17:05:41 +0000 (UTC) Date: Wed, 21 Mar 2012 17:05:41 +0000 (UTC) From: "Lars Hofhansl (Updated) (JIRA)" To: issues@hbase.apache.org Message-ID: <1845126410.42485.1332349541074.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1613048843.72782.1303368306093.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HBASE-3809) .META. may not come back online if > number of executors servers crash and one of those > number of executors was carrying meta MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-3809: --------------------------------- Fix Version/s: (was: 0.94.0) 0.96.0 Moving out of 0.94. > .META. may not come back online if > number of executors servers crash and one of those > number of executors was carrying meta > ------------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-3809 > URL: https://issues.apache.org/jira/browse/HBASE-3809 > Project: HBase > Issue Type: Bug > Reporter: stack > Priority: Critical > Fix For: 0.96.0 > > > This is a duplicate of another issue but at the moment I cannot find the original. > If you had a 700 node cluster and then you ran something on the cluster which killed 100 nodes, and .META. had been running on one of those downed nodes, well, you'll have all of your master executors processing ServerShutdowns and more than likely non of the currently processing executors will be servicing the shutdown of the server that was carrying .META. > Well, for server shutdown to complete at the moment, an online .META. is required. So, in the above case, we'll be stuck. The current executors will not be able to clear to make space for the processing of the server carrying .META. because they need .META. to complete. > We can make the master handlers have no bound so it will expand to accomodate all crashed servers -- so it'll have the one .META. in its queue -- or we can change it so shutdown handling doesn't require .META. to be on-line (its used to figure the regions the server was carrying); we could use the master's in-memory picture of the cluster (But IIRC, there may be holes ....TBD) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira