hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14044) Synchronization issue in delegation token cancel functionality
Date Thu, 02 Feb 2017 01:11:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849212#comment-15849212

Xiao Chen commented on HADOOP-14044:

Thanks [~hgadre], I think this is the right direction, nice job. :)
Since {{getTokenInfo}} checks in-memory then queries ZK, and is again from that {{Public Evolving}}
ADTSM, throwing may not be as clean.

But I think we can replace this logic in {{cancelToken}} with a new one:
    try {
      if (!currentTokens.containsKey(id)) {
        // See if token can be retrieved and placed in currentTokens
      return super.cancelToken(token, canceller);
    } catch (Exception e) {
      LOG.error("Exception while checking if token exist !!", e);
      return id;

We can always check if token exists in ZK, IOE if not. This would give us the 404 you wanted.
If it exists, we proceed to add it to the {{currentTokens}}.

I would say this behavior change is a bug fix, so compatible. Only draw back is we have to
check ZK for each token cancellation, so more load on ZK and some perf decrease on the cancellation.
But don't think this is performance sensitive so trade off for correctness seems worthy.

> Synchronization issue in delegation token cancel functionality
> --------------------------------------------------------------
>                 Key: HADOOP-14044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14044
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Hrishikesh Gadre
>            Assignee: Hrishikesh Gadre
>         Attachments: dt_fail.log, dt_success.log, HADOOP-14044-001.patch
> We are using Hadoop delegation token authentication functionality in Apache Solr. As
part of the integration testing, I found following issue with the delegation token cancelation
> Consider a setup with 2 Solr servers (S1 and S2) which are configured to use delegation
token functionality backed by Zookeeper. Now invoke following steps,
> [Step 1] Send a request to S1 to create a delegation token.
>   (Delegation token DT is created successfully)
> [Step 2] Send a request to cancel DT to S2
>   (DT is canceled successfully. client receives HTTP 200 response)
> [Step 3] Send a request to cancel DT to S2 again
>   (DT cancelation fails. client receives HTTP 404 response)
> [Step 4] Send a request to cancel DT to S1
> At this point we get two different responses.
> - DT cancelation fails. client receives HTTP 404 response
> - DT cancelation succeeds. client receives HTTP 200 response
> Also as per the current implementation, each server maintains an in_memory cache of current
tokens which is updated using the ZK watch mechanism. e.g. the ZK watch on S1 will ensure
that the in_memory cache is synchronized after step 2.
> After investigation, I found the root cause for this behavior is due to the race condition
between step 4 and the firing of ZK watch on S1. Whenever the watch fires before the step
4 - we get HTTP 404 response (as expected). When that is not the case - we get HTTP 200 response
along with following ERROR message in the log,
> {noformat}
> Attempted to remove a non-existing znode /ZKDTSMTokensRoot/DT_XYZ
> {noformat}
> From client perspective, the server *should* return HTTP 404 error when the cancel request
is sent out for an invalid token.
> Ref: Here is the relevant Solr unit test for reference,
> https://github.com/apache/lucene-solr/blob/746786636404cdb8ce505ed0ed02b8d9144ab6c4/solr/core/src/test/org/apache/solr/cloud/TestSolrCloudWithDelegationTokens.java#L285

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message