hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hrishikesh Gadre (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14044) Synchronization issue in delegation token cancel functionality
Date Wed, 01 Feb 2017 03:18:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847916#comment-15847916

Hrishikesh Gadre commented on HADOOP-14044:

[~xiaochen] Thanks for the feedback. The patch-1 was just a starting point for the discussion
(it doesn't compile fully if you see the build output :)).

bq. So it seems there's no good way to satisfy your request of 'when 2 callers are cancelling,
exactly 1 should see success'. I guess this may end up inline with zookeeper's documentation
- client has to handle it.

I don't think that is quite accurate. I understand that if 2 callers invoke cancel operation
*concurrently* then it may not be possible to figure out which client actually deleted the
token. But in my case there is just a single caller invoking cancel operation one after another.
Hence I think it is a case of *application level* cache inconsistency which is being exposed
to the client.

If we don't want to binary compatibility at the API level, would it be ok to change the API
semantics ? e.g. an alternative would be for the server to return HTTP 200 in all cases (e.g.
instead of sending 404 in case of nonexistent token). This will ensure that server provides
consistent response to client regardless of cache inconsistency at the server. The [REST API
semantics|http://restcookbook.com/HTTP%20Methods/idempotency/] also seem to favor this approach.

What do you think?

> Synchronization issue in delegation token cancel functionality
> --------------------------------------------------------------
>                 Key: HADOOP-14044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14044
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Hrishikesh Gadre
>            Assignee: Hrishikesh Gadre
>         Attachments: dt_fail.log, dt_success.log, HADOOP-14044-001.patch
> We are using Hadoop delegation token authentication functionality in Apache Solr. As
part of the integration testing, I found following issue with the delegation token cancelation
> Consider a setup with 2 Solr servers (S1 and S2) which are configured to use delegation
token functionality backed by Zookeeper. Now invoke following steps,
> [Step 1] Send a request to S1 to create a delegation token.
>   (Delegation token DT is created successfully)
> [Step 2] Send a request to cancel DT to S2
>   (DT is canceled successfully. client receives HTTP 200 response)
> [Step 3] Send a request to cancel DT to S2 again
>   (DT cancelation fails. client receives HTTP 404 response)
> [Step 4] Send a request to cancel DT to S1
> At this point we get two different responses.
> - DT cancelation fails. client receives HTTP 404 response
> - DT cancelation succeeds. client receives HTTP 200 response
> Also as per the current implementation, each server maintains an in_memory cache of current
tokens which is updated using the ZK watch mechanism. e.g. the ZK watch on S1 will ensure
that the in_memory cache is synchronized after step 2.
> After investigation, I found the root cause for this behavior is due to the race condition
between step 4 and the firing of ZK watch on S1. Whenever the watch fires before the step
4 - we get HTTP 404 response (as expected). When that is not the case - we get HTTP 200 response
along with following ERROR message in the log,
> {noformat}
> Attempted to remove a non-existing znode /ZKDTSMTokensRoot/DT_XYZ
> {noformat}
> From client perspective, the server *should* return HTTP 404 error when the cancel request
is sent out for an invalid token.
> Ref: Here is the relevant Solr unit test for reference,
> https://github.com/apache/lucene-solr/blob/746786636404cdb8ce505ed0ed02b8d9144ab6c4/solr/core/src/test/org/apache/solr/cloud/TestSolrCloudWithDelegationTokens.java#L285

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message