Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 5 May 2011 00:16:03 +0000 (UTC)
From: "Jieshan Bean (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: 
 <902341481.23290.1304554563301.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <239238470.1167.1303780563338.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Commented] (HBASE-3820) Splitlog() executed while the
 namenode was in safemode may cause data-loss
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029066#comment-13029066 ] 

Jieshan Bean commented on HBASE-3820:
-------------------------------------

"dfs.exist(new Path("/"))" checks the dfs is available, at least, the dfs can be read.
"!checkDfsSafeMode(conf)" checks the dfs is not in safemode. because only in safemode, it can't be written.
So I think we can make it as an indirect way to check whether the dfs is writable.
Is that correct?
If not,what about do a test of writing a temp file to check whether the dfs is writable? 


> Splitlog() executed while the namenode was in safemode may cause data-loss
> --------------------------------------------------------------------------
>
>                 Key: HBASE-3820
>                 URL: https://issues.apache.org/jira/browse/HBASE-3820
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.2
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3820-MFSFix-90-V2.patch, HBASE-3820-MFSFix-90.patch
>
>
> I found this problem while the namenode went into safemode due to some unclear reasons. 
> There's one patch about this problem:
>    try {
>       HLogSplitter splitter = HLogSplitter.createLogSplitter(
>         conf, rootdir, logDir, oldLogDir, this.fs);
>       try {
>         splitter.splitLog();
>       } catch (OrphanHLogAfterSplitException e) {
>         LOG.warn("Retrying splitting because of:", e);
>         // An HLogSplitter instance can only be used once.  Get new instance.
>         splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
>           oldLogDir, this.fs);
>         splitter.splitLog();
>       }
>       splitTime = splitter.getTime();
>       splitLogSize = splitter.getSize();
>     } catch (IOException e) {
>       checkFileSystem();
>       LOG.error("Failed splitting " + logDir.toString(), e);
>       master.abort("Shutting down HBase cluster: Failed splitting hlog files...", e);
>     } finally {
>       this.splitLogLock.unlock();
>     }
> And it was really give some useful help to some extent, while the namenode process exited or been killed, but not considered the Namenode safemode exception.
>    I think the root reason is the method of checkFileSystem().
>    It gives out an method to check whether the HDFS works normally(Read and write could be success), and that maybe the original propose of this method. This's how this method implements:
>     DistributedFileSystem dfs = (DistributedFileSystem) fs;
>     try {
>       if (dfs.exists(new Path("/"))) {  
>         return;
>       }
>     } catch (IOException e) {
>       exception = RemoteExceptionHandler.checkIOException(e);
>     }
>    
>    I have check the hdfs code, and learned that while the namenode was in safemode ,the dfs.exists(new Path("/")) returned true. Because the file system could provide read-only service. So this method just checks the dfs whether could be read. I think it's not reasonable.
>     
>    

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira