hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13112) Token expiration edits may cause log corruption or deadlock
Date Wed, 07 Feb 2018 18:24:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355844#comment-16355844

Kihwal Lee commented on HDFS-13112:

The patch looks good.
- The addition of read locks ensures these edit logging activities do not collide with edit
rolling or HA transitions(In addition to the level of safety provided by {{noInterruptsLock}}).
- A write lock is not required since these don't change any state other threads are accessing
with a read lock.

And only the secret manager is edit logging with a read lock and all others are using a write
lock, there can be no concurrent edit logging and it covers the general {{FSEditLog}} thread
safety issue, not only the issue between logging and rolling.

Now, if we believe that it is only unsafe between edit logging and rolling (i.e. normal edit
logging activities are thread safe), we could make {{getDelegationToken()}}, {{renewDelegationToken()}}
and {{cancelDelegationToken()}} acquire a read lock.  And perhaps lease-related calls too.
 Any thoughts on this?

In any case, I'm +1 on the patch. If you think we can make additional locking changes, please
file a follow-up jira.

> Token expiration edits may cause log corruption or deadlock
> -----------------------------------------------------------
>                 Key: HDFS-13112
>                 URL: https://issues.apache.org/jira/browse/HDFS-13112
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.1.0-beta, 0.23.8
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-13112.patch
> HDFS-4477 specifically did not acquire the fsn lock during token cancellation based on
the belief that edit logs are thread-safe.  However, log rolling is not thread-safe.  Failure
to externally synchronize on the fsn lock during a roll will cause problems.
> For sync edit logging, it may cause corruption by interspersing edits with the end/start
segment edits.  Async edit logging may encounter a deadlock if the log queue overflows.  Luckily,
losing the race is extremely rare.  In ~5 years, we've never encountered it.  However, HDFS-13051
lost the race with async edits.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message