hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7878) recoverFileLease does not check return value of recoverLease
Date Thu, 21 Mar 2013 02:05:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608535#comment-13608535

Ted Yu commented on HBASE-7878:

>From https://builds.apache.org/job/PreCommit-HBASE-Build/4928//testReport/org.apache.hadoop.hbase.regionserver.wal/TestHLogSplit/testSplitWillNotTouchLogsIfNewHLogGetsCreatedAfterSplitStarted/:
2013-03-21 00:54:27,404 INFO  [ZombieNewLogWriterRegionServer] wal.TestHLogSplit$ZombieNewLogWriterRegionServer(1102):
Juliet file creator: created file /hbase/hlog/hlog.dat..juliet
2013-03-21 00:54:27,406 INFO  [split-log-closeStream-2] wal.HLogSplitter$OutputSink$2(1259):
Closed path /hbase/t1/ccc/recovered.edits/0000000000000000002.temp (wrote 100 edits in 221ms)
Meaning the creation of fake HLog preceded outputSink.finishWritingAndClose() below:
        throw new OrphanHLogAfterSplitException(
          "Discovered orphan hlog after split. " + fileSet.iterator().next() + " Maybe the
            + "HRegionServer was not dead when we started");
    } finally {
      status.setStatus("Finishing writing output logs and closing down.");
      splits = outputSink.finishWritingAndClose();
    status.setStatus("Archiving logs after completed split");
    archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf);
i.e. archiveLogs() was not skipped. This was a timing issue in test.
One solution is to pass a Latch into splitLog() and wait for the latch prior to the following:
      FileStatus[] currFiles = fs.listStatus(srcDir);
      if (currFiles.length > processedLogs.size()
ZombieNewLogWriterRegionServer would count down the Latch once fake HLog is written.
> recoverFileLease does not check return value of recoverLease
> ------------------------------------------------------------
>                 Key: HBASE-7878
>                 URL: https://issues.apache.org/jira/browse/HBASE-7878
>             Project: HBase
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.95.0, 0.94.6
>            Reporter: Eric Newton
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.95.0, 0.98.0, 0.94.7
>         Attachments: 7878.94, 7878-94.addendum, 7878-94.addendum2, 7878-trunk.addendum,
7878-trunk.addendum2, 7878-trunk-v10.txt, 7878-trunk-v11-test.txt, 7878-trunk-v12.txt, 7878-trunk-v13.txt,
7878-trunk-v13.txt, 7878-trunk-v2.txt, 7878-trunk-v3.txt, 7878-trunk-v4.txt, 7878-trunk-v5.txt,
7878-trunk-v6.txt, 7878-trunk-v7.txt, 7878-trunk-v8.txt, 7878-trunk-v9.txt, 7878-trunk-v9.txt
> I think this is a problem, so I'm opening a ticket so an HBase person takes a look.
> Apache Accumulo has moved its write-ahead log to HDFS. I modeled the lease recovery for
Accumulo after HBase's lease recovery.  During testing, we experienced data loss.  I found
it is necessary to wait until recoverLease returns true to know that the file has been truly
closed.  In FSHDFSUtils, the return result of recoverLease is not checked. In the unit tests
created to check lease recovery in HBASE-2645, the return result of recoverLease is always
> I think FSHDFSUtils should be modified to check the return result, and wait until it
returns true.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message