Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 879373D16 for ; Thu, 5 May 2011 00:16:45 +0000 (UTC) Received: (qmail 45867 invoked by uid 500); 5 May 2011 00:16:45 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 45839 invoked by uid 500); 5 May 2011 00:16:45 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 45831 invoked by uid 99); 5 May 2011 00:16:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2011 00:16:45 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2011 00:16:42 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 4A72CC1507 for ; Thu, 5 May 2011 00:16:03 +0000 (UTC) Date: Thu, 5 May 2011 00:16:03 +0000 (UTC) From: "Jieshan Bean (JIRA)" To: issues@hbase.apache.org Message-ID: <902341481.23290.1304554563301.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <239238470.1167.1303780563338.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-3820) Splitlog() executed while the namenode was in safemode may cause data-loss MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029066#comment-13029066 ] Jieshan Bean commented on HBASE-3820: ------------------------------------- "dfs.exist(new Path("/"))" checks the dfs is available, at least, the dfs can be read. "!checkDfsSafeMode(conf)" checks the dfs is not in safemode. because only in safemode, it can't be written. So I think we can make it as an indirect way to check whether the dfs is writable. Is that correct? If not,what about do a test of writing a temp file to check whether the dfs is writable? > Splitlog() executed while the namenode was in safemode may cause data-loss > -------------------------------------------------------------------------- > > Key: HBASE-3820 > URL: https://issues.apache.org/jira/browse/HBASE-3820 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.2 > Reporter: Jieshan Bean > Fix For: 0.90.4 > > Attachments: HBASE-3820-MFSFix-90-V2.patch, HBASE-3820-MFSFix-90.patch > > > I found this problem while the namenode went into safemode due to some unclear reasons. > There's one patch about this problem: > try { > HLogSplitter splitter = HLogSplitter.createLogSplitter( > conf, rootdir, logDir, oldLogDir, this.fs); > try { > splitter.splitLog(); > } catch (OrphanHLogAfterSplitException e) { > LOG.warn("Retrying splitting because of:", e); > // An HLogSplitter instance can only be used once. Get new instance. > splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir, > oldLogDir, this.fs); > splitter.splitLog(); > } > splitTime = splitter.getTime(); > splitLogSize = splitter.getSize(); > } catch (IOException e) { > checkFileSystem(); > LOG.error("Failed splitting " + logDir.toString(), e); > master.abort("Shutting down HBase cluster: Failed splitting hlog files...", e); > } finally { > this.splitLogLock.unlock(); > } > And it was really give some useful help to some extent, while the namenode process exited or been killed, but not considered the Namenode safemode exception. > I think the root reason is the method of checkFileSystem(). > It gives out an method to check whether the HDFS works normally(Read and write could be success), and that maybe the original propose of this method. This's how this method implements: > DistributedFileSystem dfs = (DistributedFileSystem) fs; > try { > if (dfs.exists(new Path("/"))) { > return; > } > } catch (IOException e) { > exception = RemoteExceptionHandler.checkIOException(e); > } > > I have check the hdfs code, and learned that while the namenode was in safemode ,the dfs.exists(new Path("/")) returned true. Because the file system could provide read-only service. So this method just checks the dfs whether could be read. I think it's not reasonable. > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira