From dev-return-43454-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Thu Dec 13 21:40:28 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id C0814180609 for ; Thu, 13 Dec 2018 21:40:27 +0100 (CET) Received: (qmail 38459 invoked by uid 500); 13 Dec 2018 20:40:26 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 38447 invoked by uid 99); 13 Dec 2018 20:40:26 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Dec 2018 20:40:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id B06C9C2364 for ; Thu, 13 Dec 2018 20:40:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.322 X-Spam-Level: *** X-Spam-Status: No, score=3.322 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_MIXED_ES=0.01, URIBL_BLOCKED=0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gridgain-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id kGf2FMKBJBLI for ; Thu, 13 Dec 2018 20:40:23 +0000 (UTC) Received: from mail-vk1-f169.google.com (mail-vk1-f169.google.com [209.85.221.169]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 3E4545F402 for ; Thu, 13 Dec 2018 20:40:23 +0000 (UTC) Received: by mail-vk1-f169.google.com with SMTP id d201so779471vka.0 for ; Thu, 13 Dec 2018 12:40:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gridgain-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=OguzKT3FQDRpujhE6GiphodALUXzYrPbl30gGDdJUQ8=; b=JipEfLnPVKHAASEBwahC0Gk4ZXOwdemMJ0c6/37b784e7Gn2c5qj5bTAplg48/rcu0 3lykyeRIiH+aT8qwQNV7pGGuO9N5dZcsyDyiwJfvJKgFKx5fNuO9pfo1qh77lWOScJz+ FTEeG9A7seI11vLak08V5+DoW/s81uDPbAFWr+e+UQ37TdtQ4Ub+VzZXfdl6pKDDTewz JyUL9QySSC2bgjGp2JQIMEYJXcB5NtCyBxgdQ7Py9Kvz8nnPLTxvB74f2ExS2g/fBYed 1xMAEJXdAcaKdwuxVQuQoLltOsvVY7j4u+5C1QM36Q3gz3aG/8Ck49nnDLUFEto+afgM W16A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=OguzKT3FQDRpujhE6GiphodALUXzYrPbl30gGDdJUQ8=; b=GK23fvg+KSYCSHjAziskdj1Hx8j1eFA2h9xofOIKT1yrB4fvVeK+vB9WNSom/UxjAN 2yzWTeDBAOTAt4xiWfuOkp+ENlOYcVHaK31y3GrbtLI5R28U2zNWeGIbSdwhsJWGE03h i5IangXMO9U7sbHKKQr42vrv+UN5TNjbMOBSo9Ox7sivkMY/CojaKS+hseDCSdZceBbc a3OdD9LWouXDkfA+tOWhtj1mjdmH7M8WmWSMhB9j5V949ghR8WsUcIyi7RraWFDHHY9x 47ct5gKKqjthZf8cPmlQKCWwzR2R/kUjvnwEVRjFbFJtPwXdZtG9jh062BzD2MC3Bbnz afqw== X-Gm-Message-State: AA+aEWZ0gS8W1f94fvfM2w4V03t9Sif1LXIyCSEQ/UQdktDvgTRSfEKC YW4AIX4iDXOroCgs6kMm8OVCmqTq4vvPwjAiFNVKaxOSxS4= X-Google-Smtp-Source: AFSGD/XO5u+bxuCCc+dNyJe4Qu/R1650Z/6+DU4Sf8Lg0ZfqzRJfztMiRgiZXIA+1x8kc+RVgNCbWXBvbTbDe4cV/1Q= X-Received: by 2002:a1f:ac4:: with SMTP id 187mr130814vkk.31.1544733616562; Thu, 13 Dec 2018 12:40:16 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Vladimir Ozerov Date: Thu, 13 Dec 2018 23:40:05 +0300 Message-ID: Subject: Re: Continuous queries and duplicates To: dev Content-Type: multipart/alternative; boundary="0000000000007d01cf057ced523a" --0000000000007d01cf057ced523a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable [1] http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-an= d-MVCC-td33972.html On Thu, Dec 13, 2018 at 11:38 PM Vladimir Ozerov wrote: > Denis, > > Not really. They are used to ensure that ordering of notifications is > consistent with ordering of updates, so that when a key K is updated to V= 1, > then V2, then V3, you never observe V1 -> V3 -> V2. It also solves > duplicate notification problem in case of node failures, when the same > update is delivered twice. > > However, partition counters are unable to solve duplicates problem in > general. Essentially, the question is how to get consistent view on some > data plus all notifications which happened afterwards. There are only two > ways to achieve this - either lock entries during initial query, or take = a > kind of consistent data snapshot. The former was never implemented in > Ignite - our Scan and SQL queries do not user locking. The latter is > achievable in theory with MVCC. I raised that question earlier [1] (see > p.2), and we came to conclusion that it might be a good feature for the > product. It is not implemented that way for MVCC now, but most probably i= s > not extraordinary difficult to implement. > > Vladimir. > > [1] > http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-= and-MVCC-td33972.html#a33998 > > On Thu, Dec 13, 2018 at 11:17 PM Denis Magda wrote: > >> Vladimir, >> >> The partition counter is supposed to be used internally to solve the >> duplication issue. Does it sound like a right approach then? >> >> What would be an approach for SQL queries? Not sure the partition counte= r >> is applicable. >> >> -- >> Denis >> >> On Thu, Dec 13, 2018 at 11:16 AM Vladimir Ozerov >> wrote: >> >> > Partition counter is internal implemenattion detail, which has no >> sensible >> > meaning to end users. It should not be exposed through public API. >> > >> > On Thu, Dec 13, 2018 at 10:14 PM Denis Magda wrote= : >> > >> > > Hello Piotr, >> > > >> > > That's a known problem and I thought a JIRA ticket already exists. >> > However, >> > > failed to locate it. The ticket for the improvement should be create= d >> as >> > a >> > > result of this conversation. >> > > >> > > Speaking of an initial query type, I would differentiate from >> ScanQueries >> > > and SqlQueries. For the former, it sounds reasonable to apply the >> > > partitionCounter logic. As for the latter, Vladimir Ozerov will it b= e >> > > addressed as part of MVCC/Transactional SQL activities? >> > > >> > > Btw, Piotr what's your initial query type? >> > > >> > > -- >> > > Denis >> > > >> > > On Thu, Dec 13, 2018 at 3:28 AM Piotr Roma=C5=84ski < >> piotr.romanski@gmail.com >> > > >> > > wrote: >> > > >> > > > Hi, as suggested by Ilya here: >> > > > >> > > > >> > > >> > >> http://apache-ignite-users.70518.x6.nabble.com/Continuous-queries-and-du= plicates-td25314.html >> > > > I'm resending it to the developers list. >> > > > >> > > > From that thread we know that there might be duplicates between >> initial >> > > > query results and listener entries received as part of continuous >> > query. >> > > > That means that users need to manually dedupe data. >> > > > >> > > > In my opinion the manual deduplication in some use cases may lead = to >> > > > possible memory problems on the client side. In order to remove >> > > duplicated >> > > > notifications which we are receiving in the local listener, we nee= d >> to >> > > keep >> > > > all initial query results in memory (or at least their unique ids)= . >> > > > Unfortunately, there is no way (is there?) to find a point in time >> when >> > > we >> > > > can be sure that no dups will arrive anymore. That would mean that >> we >> > > need >> > > > to keep that data indefinitely and use it every time a new >> notification >> > > > arrives. In case of multiple continuous queries run from a single >> JVM, >> > > this >> > > > might eventually become a memory or performance problem. I can see >> the >> > > > following possible improvements to Ignite: >> > > > >> > > > 1. The deduplication between initial query and incoming notificati= on >> > > could >> > > > be done fully in Ignite. As far as I know there is already the >> > > > updateCounter and partition id for all the objects so it could be >> used >> > > > internally. >> > > > >> > > > 2. Add a guarantee that notifications arriving in the local listen= er >> > > after >> > > > query() method returns are not duplicates. This kind of >> functionality >> > > would >> > > > require a specific synchronization inside Ignite. It would also me= an >> > that >> > > > the query() method cannot return before all potential duplicates a= re >> > > > processed by a local listener what looks wrong. >> > > > >> > > > 3. Notify users that starting from a given notification they can b= e >> > sure >> > > > they will not receive any duplicates anymore. This could be an >> > additional >> > > > boolean flag in the CacheQueryEntryEvent. >> > > > >> > > > 4. CacheQueryEntryEvent already exposes the partitionUpdateCounter= . >> > > > Unfortunately we don't have this information for initial query >> results. >> > > If >> > > > we had, a client could manually deduplicate notifications and get >> rid >> > of >> > > > initial query results for a given partition after newer >> notifications >> > > > arrive. Also it would be very convenient to expose partition id as >> well >> > > but >> > > > now we can figure it out using the affinity service. The assumptio= n >> > here >> > > is >> > > > that notifications are ordered by partitionUpdateCounter (is it >> true?). >> > > > >> > > > Please correct me if I'm missing anything. >> > > > >> > > > What do you think? >> > > > >> > > > Piotr >> > > > >> > > >> > >> > --0000000000007d01cf057ced523a--