geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Galen O'Sullivan <gosulli...@pivotal.io>
Subject Re: Review Request 60570: GEODE-3153 Client receives duplicate events during rolling upgrade
Date Fri, 30 Jun 2017 21:56:12 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60570/#review179384
-----------------------------------------------------------



I've been looking at this with @WireBaron, and we're wondering whether a client membership
ID can still get sent with a zeroed UUID if it's passed between two Gemfire 9.1 servers as
a result of client queue replication failover. We tried to write a test and failed.

The basic idea is something like this:
* start two servers, an interested client and an event-creating client.
  One server is running 9.0 and the other 9.1 .
* put a couple of events in the system via the 9.0 server (should be fine with either).
* kill the 9.0 server and add a new 9.1 server to the system.
  At this point, if we're understanding client queue replication correctly, the new server
should receive a copy of the queue from the other 9.1 server.
* Check the same events on the new server to see if they've lost the UUID.

Does that sound reasonable to you?

I'm not sure how to trigger failures in the right order to verify this isn't an issue, but
I think it's reasonably plausible that during rolling upgrades someone could encounter the
issue. It would require an old client to get a queue from a new version server that has been
passed that queue by another new version server.

If that's not possible to trigger, or you can't test it and are confident that the scenario
we described won't happen, then go ahead and ship it.


geode-core/src/main/java/org/apache/geode/internal/cache/EventID.java
Lines 301 (patched)
<https://reviews.apache.org/r/60570/#comment254050>

    Maybe declare `dis` and then set it based on the version number?
    
    If you're feeling charitable, you could rename `dis` to `dataInputStream`.



geode-core/src/test/java/org/apache/geode/internal/cache/tier/sockets/ClientServerMiscBCDUnitTest.java
Lines 105 (patched)
<https://reviews.apache.org/r/60570/#comment254064>

    I'm not all that familiar with backwards compatibility tests, but do we want to be testing
agains current version and testing version or current and 9.0? Will backwards compatibility
set `testVersion` to 9.0 among other versions?


- Galen O'Sullivan


On June 30, 2017, 4:33 p.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60570/
> -----------------------------------------------------------
> 
> (Updated June 30, 2017, 4:33 p.m.)
> 
> 
> Review request for geode, Alexander Murmann, Barry Oglesby, Galen O'Sullivan, Hitesh
Khamesra, and Brian Rowe.
> 
> 
> Bugs: GEODE-3153
>     https://issues.apache.org/jira/browse/GEODE-3153
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> Another problem was found in backward-compatibility testing.  If a 1.0.0 client was receiving
subscription events generated by a 1.0.0 peer "feeder" member and the events were routed through
a 1.0.0 server the client might see duplicate events when the server is stopped and the client
fails over to a 1.2.0 server holding its redundant subscription queue.  This is especially
possible if a large "ack" period is established in the client.
> 
> The problem stems from the EventID deserialization/reserialization of the memberID bytes
when sending to a 1.0 client.  It was deserializing using Version.CURRENT, which ignores the
UUID bytes in the serialized ID.  Then it serialized the identifier using the client's version,
which includes the UUID bytes but which are zero due to the version used in deserialization.
> 
> 
> Diffs
> -----
> 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
bc3d708da2ae9a8e386accb8d15e2ed49123241e 
>   geode-core/src/main/java/org/apache/geode/internal/cache/EventID.java 55c89f1f2e0800371dd4a30c4312c44f942a45ea

>   geode-core/src/test/java/org/apache/geode/internal/cache/tier/sockets/ClientServerMiscBCDUnitTest.java
bc48d976096fafe2545e707da68dab5120ddca51 
>   geode-core/src/test/java/org/apache/geode/internal/cache/tier/sockets/ClientServerMiscDUnitTest.java
bfe4646b9abdf6075e8d30fab3d79924faade2aa 
>   geode-core/src/test/resources/org/apache/geode/codeAnalysis/sanctionedDataSerializables.txt
b69e004d63d74eccd5cd562ea269363ba3f2782e 
> 
> 
> Diff: https://reviews.apache.org/r/60570/diff/1/
> 
> 
> Testing
> -------
> 
> new unit tests, precheckin
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message