hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1142) Lease recovery doesn't reassign lease when triggered by append()
Date Tue, 18 May 2010 18:40:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868782#action_12868782

Todd Lipcon commented on HDFS-1142:

Hey Konstantin,

I agree that this shouldn't be marked blocker while discussion is going on.

Let me better explain the context with regards to HBase. HBase uses ZK already to determine
regionserver liveness. If a region server dies, it loses its ZK session, and thus an ephemeral
znode disappears. The master notices this, initiates commitlog recovery for that server, and
eventually reassigns the regions elsewhere. To provide proper database-like semantics, we
need to ensure that once log recovery commences, the regionserver cannot write any more to
that log (otherwise writes might be lost forever).

Of course this all works fine if the regionserver has truly died. A big issue we face, though,
is one of long garbage collection pauses (sound familiar?). In some cases, the pauses can
last longer than the zk session timeout. Thus, the hbase master decides that the server has
died and does log splitting, region reassignment, etc. Unfortunately, in this scenario, the
region server then comes back to life and flushes a few more writes to the log file, which
summarily get lost forever even though the client thinks they're committed. The regionserver
eventually "notices" that it lost its ZK session and shuts itself down, but in practice it
often has time to get off some last edits before doing so.

Clearly, using locks in ZK is subject to the same issue above - the issue is that our ZK coordination
is not synchronous with our storage access.

There are two solutions I can think of here: (a) the "STONITH" technique ( http://en.wikipedia.org/wiki/STONITH
) - we could run the regionservers in a container service which allows us to kill -9 the regionserver
when we think it should be dead. But this is obviously more complicated with regard to deployment,
additional RPCs, etc. (b) file access revocation - this is what we're trying to do with lease
recovery and what you're suggesting should not be possible.

Here's a question - as you described it, the original lease holder and the recovering lease
holder race to recover the lease. If the original holder wins the recovery, are we guaranteed
that no interceding appends have occurred? eg what happens if the recovering process wins,
opens the file for append, and immediately closes it. Are we guaranteed then that another
flush() call from the client at that point would definitely fail, or can it transparently
regain the lease from the now-closed file?

> Lease recovery doesn't reassign lease when triggered by append()
> ----------------------------------------------------------------
>                 Key: HDFS-1142
>                 URL: https://issues.apache.org/jira/browse/HDFS-1142
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-1142.txt, hdfs-1142.txt
> If a soft lease has expired and another writer calls append(), it triggers lease recovery
but doesn't reassign the lease to a new owner. Therefore, the old writer can continue to allocate
new blocks, try to steal back the lease, etc. This is for the testRecoveryOnBlockBoundary
case of HDFS-1139

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message