hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
Date Fri, 06 Jan 2012 18:35:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181500#comment-13181500
] 

ramkrishna.s.vasudevan commented on HBASE-5137:
-----------------------------------------------

@Ted
One more thing, we should abort even without checking the file system. Because when we check
the file system and if it says the File system is fine then we dont abort. But the log split
has any way not happened.


                
> MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-5137
>                 URL: https://issues.apache.org/jira/browse/HBASE-5137
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>
> I am not sure if this bug was already raised in JIRA.
> In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler
started with splitLog.
> But as the HDFS was down the check waitOnSafeMode throws IOException.
> {code}
> try {
>         // If FS is in safe mode, just wait till out of it.
>         FSUtils.waitOnSafeMode(conf,
>           conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));  
>         splitter.splitLog();
>       } catch (OrphanHLogAfterSplitException e) {
> {code}
> We catch the exception
> {code}
> } catch (IOException e) {
>       checkFileSystem();
>       LOG.error("Failed splitting " + logDir.toString(), e);
>     }
> {code}
> So the HLog split itself did not happen. We encontered like 4 regions that was recently
splitted in the crashed RS was lost.
> Can we abort the Master in such scenarios? Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message