hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13112) Token expiration edits may cause log corruption or deadlock
Date Thu, 15 Feb 2018 18:19:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366049#comment-16366049

Daryn Sharp commented on HDFS-13112:

Xiao, good questions.

Yes, typically the edit log should not be synced while holding the lock, esp. the write lock
because it stalls everything.  Simple answer, it was already there.  Infrequent syncs with
the read lock are probably "ok". I could just remove the sync.  The only "risk" is issuing
tokens with a lost key, which isn't an issue because if the token is synced, its secret was
implicitly synced.

Expiry edits don't need a sync for the reason you state.  Failover will expire them.  Unlike
an explicit cancel, an expiry isn't essential for consistency.

> Token expiration edits may cause log corruption or deadlock
> -----------------------------------------------------------
>                 Key: HDFS-13112
>                 URL: https://issues.apache.org/jira/browse/HDFS-13112
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.1.0-beta, 0.23.8
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-13112.1.patch, HDFS-13112.patch
> HDFS-4477 specifically did not acquire the fsn lock during token cancellation based on
the belief that edit logs are thread-safe.  However, log rolling is not thread-safe.  Failure
to externally synchronize on the fsn lock during a roll will cause problems.
> For sync edit logging, it may cause corruption by interspersing edits with the end/start
segment edits.  Async edit logging may encounter a deadlock if the log queue overflows.  Luckily,
losing the race is extremely rare.  In ~5 years, we've never encountered it.  However, HDFS-13051
lost the race with async edits.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message