hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5364) Deadlock between RenewalTimerTask methods cancel() and run()
Date Mon, 08 Jul 2013 21:49:49 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702489#comment-13702489

Karthik Kambatla commented on MAPREDUCE-5364:

bq. A cancelled flag could be used on the DelegationTokenToRenew structure itself. Set intent
to cancel before attempting to cancel the timer task, and check this during renewal and before
queuing another renewal.

On second thought, I don't think we should a cancelled flag to DelegationTokenToRenew to address
a synchronization in RenewalTimerTask.

Thinking more about this, the cleanest approach seemed to be: 
# Serialize timer cancellation and token renewal - via synchronization
# On successful token renewal, call {{setTimerForTokenRenewal}}

Uploaded patch (addendum-2) to implement this. Also, moved the token renewal code from RenewalTimerTask#run
to DelegationTokenToRenew#renew for clarity.

[~sseth], can you take a look at the latest patch. 
> Deadlock between RenewalTimerTask methods cancel() and run()
> ------------------------------------------------------------
>                 Key: MAPREDUCE-5364
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5364
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>             Fix For: 1.2.1
>         Attachments: mr-5364-1.patch, mr-5364-addendum-1.patch, mr-5364-addendum-2.patch
> MAPREDUCE-4860 introduced a local variable {{cancelled}} in {{RenewalTimerTask}} to fix
the race where {{DelegationTokenRenewal}} attempts to renew a token even after the job is
removed. However, the patch also makes {{run()}} and {{cancel()}} synchronized methods leading
to a potential deadlock against {{run()}}'s catch-block (error-path).
> The deadlock stacks below:
> {noformat}
>  - org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$RenewalTimerTask.cancel()
@bci=0, line=240 (Interpreted frame)
>  - org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.removeDelegationTokenRenewalForJob(org.apache.hadoop.mapreduce.JobID)
@bci=109, line=319 (Interpreted frame)
> {noformat}
> {noformat}
>  - org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.removeFailedDelegationToken(org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$DelegationTokenToRenew)
@bci=62, line=297 (Interpreted frame)
>  - org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.access$300(org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$DelegationTokenToRenew)
@bci=1, line=47 (Interpreted frame)
>  - org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$RenewalTimerTask.run()
@bci=148, line=234 (Interpreted frame)
> {noformat}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message