Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DCC4DB9DF for ; Tue, 10 Jan 2012 17:41:01 +0000 (UTC) Received: (qmail 22197 invoked by uid 500); 10 Jan 2012 17:41:01 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 22029 invoked by uid 500); 10 Jan 2012 17:41:00 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 22020 invoked by uid 99); 10 Jan 2012 17:41:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jan 2012 17:41:00 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jan 2012 17:40:59 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 3911B14204F for ; Tue, 10 Jan 2012 17:40:39 +0000 (UTC) Date: Tue, 10 Jan 2012 17:40:39 +0000 (UTC) From: "ramkrishna.s.vasudevan (Updated) (JIRA)" To: issues@hbase.apache.org Message-ID: <2048255974.26156.1326217239235.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1250688710.15239.1325868879381.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5137: ------------------------------------------ Fix Version/s: 0.90.6 0.92.1 Committed to 0.90 and trunk. Do we need to commit in 0.92 also? > MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException > ------------------------------------------------------------------------------------ > > Key: HBASE-5137 > URL: https://issues.apache.org/jira/browse/HBASE-5137 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.4 > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.1, 0.90.6 > > Attachments: 5137-trunk.txt, HBASE-5137.patch, HBASE-5137.patch > > > I am not sure if this bug was already raised in JIRA. > In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog. > But as the HDFS was down the check waitOnSafeMode throws IOException. > {code} > try { > // If FS is in safe mode, just wait till out of it. > FSUtils.waitOnSafeMode(conf, > conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); > splitter.splitLog(); > } catch (OrphanHLogAfterSplitException e) { > {code} > We catch the exception > {code} > } catch (IOException e) { > checkFileSystem(); > LOG.error("Failed splitting " + logDir.toString(), e); > } > {code} > So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost. > Can we abort the Master in such scenarios? Pls suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira