geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hitesh Khamesra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GEODE-697) A client thread timing out an operation and performing further operations can result in cache inconsistency
Date Tue, 15 Mar 2016 23:27:33 GMT

    [ https://issues.apache.org/jira/browse/GEODE-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196470#comment-15196470
] 

Hitesh Khamesra commented on GEODE-697:
---------------------------------------

[~bschuchardt] what if we ignore eventId check on secondary and just apply "as this event
is coming from primary bucket". is RVV can play any role on secondary for that event then?

> A client thread timing out an operation and performing further operations can result
in cache inconsistency
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-697
>                 URL: https://issues.apache.org/jira/browse/GEODE-697
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Dan Smith
>            Assignee: Bruce Schuchardt
>
> There is a case where the primary and secondary buckets of a partitioned region can become
out of sync if a client times out while waiting for a slow operation to finish. Here's the
scenario:
> 1. A operation is started by the client and gets stuck on the server, for example by
a slow cache writer. That operation is assigned an EventID  with a sequence number of 1.
> 2. The client times out.
> 3. The client performs a second operation. That operation gets assigned an EventID with
a sequence number of 2.
> 4. The second operation is applied on all members. The EventTracker records the sequence
number 2.
> 5. The original operation continues. It is applied to the primary (because it has passed
the EventTracker test).
> 6. The original operation is rejected by the EventTracker on the secondary. The two copies
of the bucket are now inconsistent.
> One possible fix is to change the thread id of the thread on the client when the client
operation times out. That would ensure that the EventTracker will not reject the original
operation when it finally goes through, because it has a different thread id.
> If an operation is delayed on the server, for example by a very slow cache writer, the
operation can time out on the client.
> The client can then go on and perform a second operation.
> The problem is that each operation is assigned an event id which is a combination of
the clients thread id and a sequence number. That second operation has a higher sequence number.
> Once the second operation is applied to a region on a given member, the event is stored
in the EventTracker and that member will reject any lower sequence numbers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message