Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8C669200CC3 for ; Sat, 1 Jul 2017 01:15:20 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8B56C160BF6; Fri, 30 Jun 2017 23:15:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ACD88160BEB for ; Sat, 1 Jul 2017 01:15:19 +0200 (CEST) Received: (qmail 72377 invoked by uid 500); 30 Jun 2017 23:15:18 -0000 Mailing-List: contact dev-help@geode.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@geode.apache.org Delivered-To: mailing list dev@geode.apache.org Received: (qmail 72366 invoked by uid 99); 30 Jun 2017 23:15:18 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Jun 2017 23:15:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 32510C0A85; Fri, 30 Jun 2017 23:15:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.24 X-Spam-Level: *** X-Spam-Status: No, score=3.24 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LOTSOFHASH=0.25, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id yp6CvwOHfFbr; Fri, 30 Jun 2017 23:15:16 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 16E8E5F2A9; Fri, 30 Jun 2017 23:15:16 +0000 (UTC) Received: from reviews.apache.org (unknown [10.41.0.12]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 71639E002B; Fri, 30 Jun 2017 23:15:15 +0000 (UTC) Received: from reviews-vm2.apache.org (localhost [IPv6:::1]) by reviews.apache.org (ASF Mail Server at reviews-vm2.apache.org) with ESMTP id A039EC40044; Fri, 30 Jun 2017 23:15:14 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============6844862324099768554==" MIME-Version: 1.0 Subject: Re: Review Request 60570: GEODE-3153 Client receives duplicate events during rolling upgrade From: Bruce Schuchardt To: Galen O'Sullivan , Hitesh Khamesra , Barry Oglesby , Brian Rowe , Alexander Murmann Cc: Dan Smith , geode , Bruce Schuchardt Date: Fri, 30 Jun 2017 23:15:13 -0000 Message-ID: <20170630231513.20232.15582@reviews-vm2.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Bruce Schuchardt X-ReviewGroup: geode X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/60570/ X-Sender: Bruce Schuchardt References: <20170630215612.20232.63388@reviews-vm2.apache.org> In-Reply-To: <20170630215612.20232.63388@reviews-vm2.apache.org> Reply-To: Bruce Schuchardt X-ReviewRequest-Repository: geode archived-at: Fri, 30 Jun 2017 23:15:20 -0000 --===============6844862324099768554== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On June 30, 2017, 2:56 p.m., Galen O'Sullivan wrote: > > I've been looking at this with @WireBaron, and we're wondering whether a client membership ID can still get sent with a zeroed UUID if it's passed between two Gemfire 9.1 servers as a result of client queue replication failover. We tried to write a test and failed. > > > > The basic idea is something like this: > > * start two servers, an interested client and an event-creating client. > > One server is running 9.0 and the other 9.1 . > > * put a couple of events in the system via the 9.0 server (should be fine with either). > > * kill the 9.0 server and add a new 9.1 server to the system. > > At this point, if we're understanding client queue replication correctly, the new server should receive a copy of the queue from the other 9.1 server. > > * Check the same events on the new server to see if they've lost the UUID. > > > > Does that sound reasonable to you? > > > > I'm not sure how to trigger failures in the right order to verify this isn't an issue, but I think it's reasonably plausible that during rolling upgrades someone could encounter the issue. It would require an old client to get a queue from a new version server that has been passed that queue by another new version server. > > > > If that's not possible to trigger, or you can't test it and are confident that the scenario we described won't happen, then go ahead and ship it. Please use Geode version numbers. I modified my new test to do as you described and it passed, as I expected it to. During image transfer the first 1.2.0 server would just transmit the membershipID bytes in its toData method since the target is a 1.2.0 server. The second server would preserve the UUID bytes when sending the eventID to the 1.0.0 client. - Bruce ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60570/#review179384 ----------------------------------------------------------- On June 30, 2017, 3:02 p.m., Bruce Schuchardt wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60570/ > ----------------------------------------------------------- > > (Updated June 30, 2017, 3:02 p.m.) > > > Review request for geode, Alexander Murmann, Barry Oglesby, Galen O'Sullivan, Hitesh Khamesra, and Brian Rowe. > > > Bugs: GEODE-3153 > https://issues.apache.org/jira/browse/GEODE-3153 > > > Repository: geode > > > Description > ------- > > Another problem was found in backward-compatibility testing. If a 1.0.0 client was receiving subscription events generated by a 1.0.0 peer "feeder" member and the events were routed through a 1.0.0 server the client might see duplicate events when the server is stopped and the client fails over to a 1.2.0 server holding its redundant subscription queue. This is especially possible if a large "ack" period is established in the client. > > The problem stems from the EventID deserialization/reserialization of the memberID bytes when sending to a 1.0 client. It was deserializing using Version.CURRENT, which ignores the UUID bytes in the serialized ID. Then it serialized the identifier using the client's version, which includes the UUID bytes but which are zero due to the version used in deserialization. > > > Diffs > ----- > > geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java bc3d708da2ae9a8e386accb8d15e2ed49123241e > geode-core/src/main/java/org/apache/geode/internal/Version.java 557697159da644915e4ffe2405cdddc9ef37c5ac > geode-core/src/main/java/org/apache/geode/internal/cache/EventID.java 55c89f1f2e0800371dd4a30c4312c44f942a45ea > geode-core/src/test/java/org/apache/geode/internal/cache/tier/sockets/ClientServerMiscBCDUnitTest.java bc48d976096fafe2545e707da68dab5120ddca51 > geode-core/src/test/java/org/apache/geode/internal/cache/tier/sockets/ClientServerMiscDUnitTest.java bfe4646b9abdf6075e8d30fab3d79924faade2aa > geode-core/src/test/resources/org/apache/geode/codeAnalysis/sanctionedDataSerializables.txt b69e004d63d74eccd5cd562ea269363ba3f2782e > > > Diff: https://reviews.apache.org/r/60570/diff/2/ > > > Testing > ------- > > new unit tests, precheckin > > > Thanks, > > Bruce Schuchardt > > --===============6844862324099768554==--