hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-503) DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false
Date Fri, 12 Apr 2013 23:20:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630749#comment-13630749
] 

Siddharth Seth commented on YARN-503:
-------------------------------------

bq. I don't like sleeps either. 1s is an eternity in this case because the initial renew and
cancel timer tasks fire immediately on mocked objects, so it should run in a few ms. I assume
you are suggesting using notify in a mock'ed answer method? Multiple timers are expected to
fire in some cases, so it would probably require something like a CountdownLatch, which will
get tricky to keep swapping in a new one by re-adding mocked responses with the new latch.
Let me know if you feel it's worth it to change it.
Was actually suggesting doing the post-sleep verify in a check-sleep loop, instead of just
sleeping. Passing this step indicates the required execution has completed. Would prefer keeping
a sleep out of the tests if we can. Otherwise a longer sleep for sure.

bq. It's ok if another task is executing, because it's just trying to abort any pending task.
Since there's only one possible pending task per token at any given time, the wrong task can't
be cancelled. Did I miss an edge case?
I think there's an edge case. Sequence
1. [t1] timerTask is a RenewalTask
2. [t1] timer kicks in and starts executing
3. [t2] scheduleCancelled gets called in a parallel thread [via AppRemovalTask]
4. [t2] scheduleCancelled.abortScheduled called - synchronized but does nothing useful since
the current task is already running.
5. [t2] scheduleCancelled runs to completion and creates a cancelTask
6. [t1] completes execution - and calls scheduleTask(new TokenRenewTask(), renewIn) - which
effectively destorys the scheduled cancelTask

bq. Are you suggesting to move managedToken.add(appId) into the loop in addApplication? I
was trying to encapsulate the implementation details of adding/removing the appId within ManagedApp.
Is it ok to leave it as-is?
I thought it'd be cleaner leaving this outside of ManagedApp - ManagedApp should not be managing
ManagedTokens. IAC, don't feel strongly about this; whatever you decide.
                
> DelegationTokens will be renewed forever if multiple jobs share tokens and the first
one sets JOB_CANCEL_DELEGATION_TOKEN to false
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-503
>                 URL: https://issues.apache.org/jira/browse/YARN-503
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha
>            Reporter: Siddharth Seth
>            Assignee: Daryn Sharp
>         Attachments: YARN-503.patch, YARN-503.patch
>
>
> The first Job/App to register a token is the one which DelegationTokenRenewer associates
with a a specific Token. An attempt to remove/cancel these shared tokens by subsequent jobs
doesn't work - since the JobId will not match.
> As a result, Even if subsequent jobs have MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN set
to true - tokens will not be cancelled when those jobs complete.
> Tokens will eventually be removed from the RM / JT when the service that issued them
considers them to have expired or via an explicit cancelDelegationTokens call (not implemented
yet in 23).
> A side affect of this is that the same delegation token will end up being renewed multiple
times (a separate TimerTask for each job which uses the token).
> DelegationTokenRenewer could maintain a reference count/list of jobIds for shared tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message