From dev-return-42585-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Wed Nov 28 18:26:54 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id EEF12180658 for ; Wed, 28 Nov 2018 18:26:53 +0100 (CET) Received: (qmail 78171 invoked by uid 500); 28 Nov 2018 17:26:53 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 78159 invoked by uid 99); 28 Nov 2018 17:26:52 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2018 17:26:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id EC1ACC70C6 for ; Wed, 28 Nov 2018 17:26:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.152 X-Spam-Level: X-Spam-Status: No, score=0.152 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FROM_EXCESS_BASE64=0.105, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id L1ColndNgE-C for ; Wed, 28 Nov 2018 17:26:49 +0000 (UTC) Received: from mail-oi1-f179.google.com (mail-oi1-f179.google.com [209.85.167.179]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id EAB6060E3B for ; Wed, 28 Nov 2018 17:26:48 +0000 (UTC) Received: by mail-oi1-f179.google.com with SMTP id x23so23271407oix.3 for ; Wed, 28 Nov 2018 09:26:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=u1blHbw1yBPvE4ALVAbu6jvPtYcorTkC6Gr8WwjNHZQ=; b=CSaWIBaWz8THWdNzaopwcFvKpbEY0v7HzwQFzBCm6K0nxFm3Tvz065U6f+uWnzpcct 2KorGRXVNtbpMRmIxCsz/lCmpP8JbgywKdnrm6qwWosFdiX1uvtFVh8UW+CLosIXSNza 9pvENKQulAVl7eRCdZIydWVN57B6rlU3pnVUiG4dzz74OHVfmaCu0Ft/ivSacuU/nKsl PJ7lKENK8DoWGjKVTKlzqHKGEkgEGRuMOCbOQMnjZBhqdFJoMf9CAgdwmO1ZtbrK7i6y oVAMKL1KDaCi9VIYsx7egw3BDllx7s/urotNWhwLRHwK0PLkP2WYmZn7z2bSd0Je6o8b OMjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=u1blHbw1yBPvE4ALVAbu6jvPtYcorTkC6Gr8WwjNHZQ=; b=bmTRVip+DWyBrILYg6QsCJ7+GrKyO5N3NWYxlJr0M/Rhqw49N7MvwZzpvBbILDPjXj vCip4pcJw84vYYDqSRzacO++KC7G7zAbbFbbp3r0bPhoOnkPTqwHW5eYZoeJ53ulwN25 Qx84/JxBAjojLxfKjzfM4S6u8LgwZoV1atCo3FhO6Ejg+ipuqZoGlEJmE2Dre4IqK2Z0 SIiZkTZWpo0xyesKr0xiVwupbbdg4g+WktHdecDZ7TsCrzqzsedGz0CwsqG5HI88BrwJ fMfiUgmAPbdWLwIwZfVlDlnYmgp6kZvK969Hv9LlqfppMOMaxqByZNnkToFj12GlFgNh n1xg== X-Gm-Message-State: AA+aEWY1fdUrP0khhdKb6DkNX1Kre+2ugULfEzPS0y5xTvl1boIQFFA8 J3j/DuctfxAYX+ucmoW25441xDxyxfeymzLNmya8Xw== X-Google-Smtp-Source: AFSGD/UUqkH8posehQUSEEJnAJN2k8RuWX0DZ8i9CLIOMfyAD5fMrjRdvNv5Oe2z3u11V6m37dPTGvMRKUh7lvmN/Xg= X-Received: by 2002:a54:468b:: with SMTP id k11mr10549761oic.27.1543426007494; Wed, 28 Nov 2018 09:26:47 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?B?0J/QsNCy0LvRg9GF0LjQvSDQmNCy0LDQvQ==?= Date: Wed, 28 Nov 2018 20:26:34 +0300 Message-ID: Subject: Re: Historical rebalance To: dev@ignite.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Guys, Another one idea. We can introduce additional update counter which is incremented by MVCC transactions right after executing operation (like is done for classic transactions). And we can use that counter for searching needed WAL records. Can it did the trick? P.S. Mentally I am trying to separate facilities providing transactions and durability. And it seems to me that those facilities are in different dimensions. =D1=81=D1=80, 28 =D0=BD=D0=BE=D1=8F=D0=B1. 2018 =D0=B3. =D0=B2 16:26, =D0= =9F=D0=B0=D0=B2=D0=BB=D1=83=D1=85=D0=B8=D0=BD =D0=98=D0=B2=D0=B0=D0=BD : > > Sorry, if it was stated that a SINGLE transaction updates are applied > in a same order on all replicas then I have no questions so far. I > thought about reordering updates coming from different transactions. > > I have not got why we can assume that reordering is not possible. What > have I missed? > =D1=81=D1=80, 28 =D0=BD=D0=BE=D1=8F=D0=B1. 2018 =D0=B3. =D0=B2 13:26, =D0= =9F=D0=B0=D0=B2=D0=BB=D1=83=D1=85=D0=B8=D0=BD =D0=98=D0=B2=D0=B0=D0=BD : > > > > Hi, > > > > Regarding Vladimir's new idea. > > > We assume that transaction can be represented as a set of independent= operations, which are applied in the same order on both primary and backup= nodes. > > I have not got why we can assume that reordering is not possible. What > > have I missed? > > =D0=B2=D1=82, 27 =D0=BD=D0=BE=D1=8F=D0=B1. 2018 =D0=B3. =D0=B2 14:42, S= eliverstov Igor : > > > > > > Vladimir, > > > > > > I think I got your point, > > > > > > It should work if we do the next: > > > introduce two structures: active list (txs) and candidate list (updCn= tr -> > > > txn pairs) > > > > > > Track active txs, mapping them to actual update counter at update tim= e. > > > On each next update put update counter, associated with previous upda= te, > > > into a candidates list possibly overwrite existing value (checking tx= n) > > > On tx finish remove tx from active list only if appropriate update co= unter > > > (associated with finished tx) is applied. > > > On update counter update set the minimal update counter from the cand= idates > > > list as a back-counter, clear the candidate list and remove an associ= ated > > > tx from the active list if present. > > > Use back-counter instead of actual update counter in demand message. > > > > > > =D0=B2=D1=82, 27 =D0=BD=D0=BE=D1=8F=D0=B1. 2018 =D0=B3. =D0=B2 12:56,= Seliverstov Igor : > > > > > > > Ivan, > > > > > > > > 1) The list is saved on each checkpoint, wholly (all transactions i= n > > > > active state at checkpoint begin). > > > > We need whole the list to get oldest transaction because after > > > > the previous oldest tx finishes, we need to get the following one. > > > > > > > > 2) I guess there is a description of how persistent storage works a= nd how > > > > it restores [1] > > > > > > > > Vladimir, > > > > > > > > the whole list of what we going to store on checkpoint (updated): > > > > 1) Partition counter low watermark (LWM) > > > > 2) WAL pointer of earliest active transaction write to partition at= the > > > > time the checkpoint have started > > > > 3) List of prepared txs with acquired partition counters (which wer= e > > > > acquired but not applied yet) > > > > > > > > This way we don't need any additional info in demand message. Start= point > > > > can be easily determined using stored WAL "back-pointer". > > > > > > > > [1] > > > > https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persisten= t+Store+-+under+the+hood#IgnitePersistentStore-underthehood-LocalRecoveryPr= ocess > > > > > > > > > > > > =D0=B2=D1=82, 27 =D0=BD=D0=BE=D1=8F=D0=B1. 2018 =D0=B3. =D0=B2 11:1= 9, Vladimir Ozerov : > > > > > > > >> Igor, > > > >> > > > >> Could you please elaborate - what is the whole set of information = we are > > > >> going to save at checkpoint time? From what I understand this shou= ld be: > > > >> 1) List of active transactions with WAL pointers of their first wr= ites > > > >> 2) List of prepared transactions with their update counters > > > >> 3) Partition counter low watermark (LWM) - the smallest partition = counter > > > >> before which there are no prepared transactions. > > > >> > > > >> And the we send to supplier node a message: "Give me all updates s= tarting > > > >> from that LWM plus data for that transactions which were active wh= en I > > > >> failed". > > > >> > > > >> Am I right? > > > >> > > > >> On Fri, Nov 23, 2018 at 11:22 AM Seliverstov Igor > > > >> wrote: > > > >> > > > >> > Hi Igniters, > > > >> > > > > >> > Currently I=E2=80=99m working on possible approaches how to impl= ement historical > > > >> > rebalance (delta rebalance using WAL iterator) over MVCC caches. > > > >> > > > > >> > The main difficulty is that MVCC writes changes on tx active pha= se while > > > >> > partition update version, aka update counter, is being applied o= n tx > > > >> > finish. This means we cannot start iteration over WAL right from= the > > > >> > pointer where the update counter updated, but should include upd= ates, > > > >> which > > > >> > the transaction that updated the counter did. > > > >> > > > > >> > These updates may be much earlier than the point where the updat= e > > > >> counter > > > >> > was updated, so we have to be able to identify the point where t= he first > > > >> > update happened. > > > >> > > > > >> > The proposed approach includes: > > > >> > > > > >> > 1) preserve list of active txs, sorted by the time of their firs= t update > > > >> > (using WAL ptr of first WAL record in tx) > > > >> > > > > >> > 2) persist this list on each checkpoint (together with TxLog for > > > >> example) > > > >> > > > > >> > 4) send whole active tx list (transactions which were in active = state at > > > >> > the time the node was crushed, empty list in case of graceful no= de > > > >> stop) as > > > >> > a part of partition demand message. > > > >> > > > > >> > 4) find a checkpoint where the earliest tx exists in persisted t= xs and > > > >> use > > > >> > saved WAL ptr as a start point or apply current approach in case= the > > > >> active > > > >> > tx list (sent on previous step) is empty > > > >> > > > > >> > 5) start iteration. > > > >> > > > > >> > Your thoughts? > > > >> > > > > >> > Regards, > > > >> > Igor > > > >> > > > > > > > > > > > > -- > > Best regards, > > Ivan Pavlukhin > > > > -- > Best regards, > Ivan Pavlukhin --=20 Best regards, Ivan Pavlukhin