hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5364) Deadlock between RenewalTimerTask methods cancel() and run()
Date Thu, 11 Jul 2013 20:01:48 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706161#comment-13706161

Siddharth Seth commented on MAPREDUCE-5364:

bq. Looking at the code, I don't see a deadlock possibility. While a call to setTimerForTokenRenewal
requires a lock on DelegationTokenRenewer.class, I don't see any method holding a lock on
DelegationTokenRenewer.class requiring a lock on delegationTokens or cancelled flag. Am I
missing something here?
You're right. I was somehow considering removeDelegationTokenRenewalForJob to be a synchronized
method. Sorry about that.

This could be fixed via the original jira (MAPREDUCE-4860 or a new jira). The deadlock being
resolved was the main issues in this jira which is already fixed. An extra renewal just leads
to an additional exception message in the logs, correct ? or is it more severe than that (other
than the failed unit test).

Comments on the patch itself.
The previous patch is likely better. One concern with the current patch - 'cancelled' is associated
with the current RenewalTimerTask. If removeDelegationTokenRenewalForJob tries to cancel()
while a token renewal is in progress - it effectively has no affect, since a new RenewalTimerTask
would be scheduled. This may not be an issue since the reference to the DelegationTokenToRenew
object will be removed from the list of delegationTokens. Since renew has been moved into
DelegationTokenToRenew - I'd prefer having the cancel / intent to cancel associated with that
as well.
> Deadlock between RenewalTimerTask methods cancel() and run()
> ------------------------------------------------------------
>                 Key: MAPREDUCE-5364
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5364
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>             Fix For: 1.2.1
>         Attachments: mr-5364-1.patch, mr-5364-addendum-1.patch, mr-5364-addendum-2.patch
> MAPREDUCE-4860 introduced a local variable {{cancelled}} in {{RenewalTimerTask}} to fix
the race where {{DelegationTokenRenewal}} attempts to renew a token even after the job is
removed. However, the patch also makes {{run()}} and {{cancel()}} synchronized methods leading
to a potential deadlock against {{run()}}'s catch-block (error-path).
> The deadlock stacks below:
> {noformat}
>  - org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$RenewalTimerTask.cancel()
@bci=0, line=240 (Interpreted frame)
>  - org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.removeDelegationTokenRenewalForJob(org.apache.hadoop.mapreduce.JobID)
@bci=109, line=319 (Interpreted frame)
> {noformat}
> {noformat}
>  - org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.removeFailedDelegationToken(org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$DelegationTokenToRenew)
@bci=62, line=297 (Interpreted frame)
>  - org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.access$300(org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$DelegationTokenToRenew)
@bci=1, line=47 (Interpreted frame)
>  - org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$RenewalTimerTask.run()
@bci=148, line=234 (Interpreted frame)
> {noformat}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message