hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10000) Initiate lease recovery for outstanding WAL files at the very beginning of recovery
Date Wed, 11 Dec 2013 06:00:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845114#comment-13845114

Ted Yu commented on HBASE-10000:

The test failure (taking place around 2013-12-11 05:11:49) was not related to patch.
For testTaskResigned() :
    int version = ZKUtil.checkExists(zkw, tasknode);
    // Could be small race here.
    if (tot_mgr_resubmit.get() == 0) waitForCounter(tot_mgr_resubmit, 0, 1, to/2);
There was no log similar to the following (corresponding to waitForCounter() call above):
2013-12-10 21:23:54,905 INFO  [main] hbase.Waiter(174): Waiting up to [3,200] milli-secs(wait.for.ratio=[1])
Meaning, the version (2) retrieved corresponded to resubmitted task. version1 retrieved same
value, leading to assertion failure.

I placed breakpoints at the beginning of splitLogDistributed() and recoverDFSFileLease() -
none of them got hit.

> Initiate lease recovery for outstanding WAL files at the very beginning of recovery
> -----------------------------------------------------------------------------------
>                 Key: HBASE-10000
>                 URL: https://issues.apache.org/jira/browse/HBASE-10000
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>             Fix For: 0.98.1
>         Attachments: 10000-0.96-v5.txt, 10000-0.96-v6.txt, 10000-recover-ts-with-pb-2.txt,
10000-recover-ts-with-pb-3.txt, 10000-recover-ts-with-pb-4.txt, 10000-recover-ts-with-pb-5.txt,
10000-recover-ts-with-pb-6.txt, 10000-recover-ts-with-pb-7.txt, 10000-recover-ts-with-pb-7.txt,
10000-recover-ts-with-pb-8.txt, 10000-v4.txt, 10000-v5.txt, 10000-v6.txt
> At the beginning of recovery, master can send lease recovery requests concurrently for
outstanding WAL files using a thread pool.
> Each split worker would first check whether the WAL file it processes is closed.
> Thanks to Nicolas Liochon and Jeffery discussion with whom gave rise to this idea. 

This message was sent by Atlassian JIRA

View raw message