Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0635F200D5A for ; Thu, 14 Dec 2017 08:21:48 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 048E6160C04; Thu, 14 Dec 2017 07:21:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4A24A160C01 for ; Thu, 14 Dec 2017 08:21:47 +0100 (CET) Received: (qmail 52996 invoked by uid 500); 14 Dec 2017 07:21:45 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 52982 invoked by uid 99); 14 Dec 2017 07:21:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Dec 2017 07:21:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id ECB6D1A0ABC for ; Thu, 14 Dec 2017 07:21:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.801 X-Spam-Level: X-Spam-Status: No, score=-0.801 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gridgain-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id aKLWbA2fb3Bi for ; Thu, 14 Dec 2017 07:21:42 +0000 (UTC) Received: from mail-vk0-f47.google.com (mail-vk0-f47.google.com [209.85.213.47]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 4603E5F2EE for ; Thu, 14 Dec 2017 07:21:41 +0000 (UTC) Received: by mail-vk0-f47.google.com with SMTP id h203so2973203vka.6 for ; Wed, 13 Dec 2017 23:21:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gridgain-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=tbdo6Q2BBROATneFshQblllBCJ4bJizq/8eSHNkUfDE=; b=LQtFjV6sxLkzJAc4CcfbQNBXNovv6yF0SzE0L8i3uOeg2wBatBVnLOMMOLOtkhT0sm BcdKwu7uSKfgtzSb91fIHX0P7yt8rffPnWkp/GN8Z0BA9nHhrOCmjlcWH8qNoXe2Nwk4 6cPLPH3UxOm/xpDjPwFSj8jmM9+aZ3f1FDP2IRqP8iuoJwlqfoKzFF0kAolnnHYZg75R d2ABxJomgxjEI2y7JP5nx0Ec4uESnBCUtse4dP/p5Y3nZeCcH4tzUmubsSFyO7BqqTJG W48LJ6vgWYq3Z6/XuDCOWoEYT82COk2bDxIq00ByBzNdyBC706ROuHJwyuKDSoWDqLCT IolA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=tbdo6Q2BBROATneFshQblllBCJ4bJizq/8eSHNkUfDE=; b=seqlz0Ec7vhcol/X5j9al68ymzT2/4jYs8tI2hk3KTA56RJ0Ppqfv01EJF8NXYvqky h1RppR30Hx5x/Bqy7Ih0gBWs6A6aPgGHp6UgjA6ZVGA/pn69JqLtYOJL/5NfLkoMIZlm hIqbkX1b5H4X+CRWLRR1rpH3J615S2IRlnVD2lBZfQqKXZlDShzKw9epFkmVG4DKsamD tcM0Rv6lpSzX/o+2TRDpGUxy+aW6dIaF7AoxyQmKUS04sBoawGOsgNBsiS6uiwFcrsHd oO7F3I+yrI/rXY7zbpzlDL6auIeSMqTz8n8/LofOlLL1wnMCeaJtpLpS5J+47VJhxyxU pzVg== X-Gm-Message-State: AKGB3mLMZHUpRIVdvsKj8PleRxdeT9urSZgKsF/PMWvAkTFXduNwGKzm WswNYdM0eJrGr3oYpzPEshgyIfJxrU/54sUk/omPwA== X-Google-Smtp-Source: ACJfBovoU2kQHOEYY5jl7yCv9xMT96DSPLrJ6qBrg3HF621Erzgd85XBLUskHTIdvOD42Vr4HwsoPmbqH+S1avwf4vs= X-Received: by 10.31.42.193 with SMTP id q184mr8575632vkq.159.1513236100051; Wed, 13 Dec 2017 23:21:40 -0800 (PST) MIME-Version: 1.0 Received: by 10.159.48.150 with HTTP; Wed, 13 Dec 2017 23:21:39 -0800 (PST) In-Reply-To: References: From: Vladimir Ozerov Date: Thu, 14 Dec 2017 10:21:39 +0300 Message-ID: Subject: Re: Rework locking architecture for MVCC and transactional SQL To: dev@ignite.apache.org Content-Type: multipart/alternative; boundary="001a11415b9a349082056047bcea" archived-at: Thu, 14 Dec 2017 07:21:48 -0000 --001a11415b9a349082056047bcea Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Denis, Sorry, may be I was not clear enough - "tuple-approach" and "persistent approach" are the same. By "tuple" I mean a row stored inside a data block. Currently we store lock information in Java heap and proposal is to move it to data blocks. The main driver is memory - if there are a rows to be locked we will either run out of memory, or produce serious memory pressure. For example, currently update of 1M entries will consume ~500Mb of heap. With proposed approach it will consume almost nothing. The drawback is increased number of dirty data pages, but it should not be a problem because in final implementation we will update data rows before prepare phase anyway, so I do not expect any write amplification in usual case. This approach is only applicable for Ignite persistence. On Thu, Dec 14, 2017 at 1:53 AM, Denis Magda wrote: > Vladimir, > > Thanks for a throughout overview and proposal. > > > Also we could try employing tiered approach > > 1) Try to keep everything in-memory to minimize writes to blocks > > 2) Fallback to persistent lock data if certain threshold is reached. > > What are the benefits of the backed-by-persistence approach in compare to > the one based on tuples? Specifically: > - will the persistence approach work for both 3rd party and Ignite > persistence? > - any performance impacts depending on a chosen method? > - what=E2=80=99s faster to implement? > > =E2=80=94 > Denis > > > On Dec 13, 2017, at 2:10 AM, Vladimir Ozerov > wrote: > > > > Igniters, > > > > As you probably we know we work actively on MVCC [1] and transactional > SQL > > [2] features which could be treated as a single huge improvement. We > face a > > number of challenges and one of them is locking. > > > > At the moment information about all locks is kept in memory on per-entr= y > > basis (see GridCacheMvccManager). For every locked key we maintain > current > > lock owner (XID) and the list of would-be-owner transactions. When > > transaction is about to lock an entry two scenarios are possible: > > 1) If entry is not locked we obtain the lock immediately > > 2) if entry is locked we add current transaction to the wait list and > jumps > > to the next entry to be locked. Once the first entry is released by > > conflicting transaction, current transaction becomes an owner of the > first > > entry and tries to promote itself for subsequent entries. > > > > Once all required locks are obtained, response is sent to the caller. > > > > This approach doesn't work well for transactional SQL - if we update > > millions of rows in a single transaction we will simply run out of > memory. > > To mitigate the problem other database vendors keep information about > locks > > inside the tuples. I propose to apply the similar design as follows: > > > > 1) No per-entry lock information is stored in memory anymore. > > 2) The list of active transactions are maintained in memory still > > 3) When TX locks an entry, it sets special marker to the tuple [3] > > 4) When TX meets already locked entry, it enlists itself to wait queue = of > > conflicting transaction and suspends > > 5) When first transaction releases conflicting lock, it notifies and > wakes > > up suspended transactions, so they resume locking > > 6) Entry lock data is cleared on transaction commit > > 7) Entry lock data is not cleared on rollback or node restart; Instead, > we > > will could use active transactions list to identify invalid locks and > > overwrite them as needed. > > > > Also we could try employing tiered approach > > 1) Try to keep everything in-memory to minimize writes to blocks > > 2) Fallback to persistent lock data if certain threshold is reached. > > > > Thoughts? > > > > [1] https://issues.apache.org/jira/browse/IGNITE-3478 > > [2] https://issues.apache.org/jira/browse/IGNITE-4191 > > [3] Depends on final MVCC design - it could be per-tuple XID, undo > vectors, > > per-block transaction lists, etc.. > > > > Vladimir. > > --001a11415b9a349082056047bcea--