Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 733ED200D56 for ; Tue, 28 Nov 2017 03:15:35 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 71A75160C14; Tue, 28 Nov 2017 02:15:35 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B777F160C13 for ; Tue, 28 Nov 2017 03:15:34 +0100 (CET) Received: (qmail 56995 invoked by uid 500); 28 Nov 2017 02:15:33 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 56984 invoked by uid 99); 28 Nov 2017 02:15:33 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Nov 2017 02:15:33 +0000 Received: from [192.168.75.66] (c-67-160-238-197.hsd1.ca.comcast.net [67.160.238.197]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 214B81A002E for ; Tue, 28 Nov 2017 02:15:32 +0000 (UTC) From: Denis Magda Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: Rework storage format to index-organized approach Date: Mon, 27 Nov 2017 18:15:30 -0800 References: To: dev@ignite.apache.org In-Reply-To: Message-Id: <7CB8E3DA-32A1-4AD8-8507-2C471857547A@apache.org> X-Mailer: Apple Mail (2.3273) archived-at: Tue, 28 Nov 2017 02:15:35 -0000 Vladimir, How the free lists will be affected by the indexed-organized = architecture? =46rom what I see they=E2=80=99re becoming optional. =E2=80=94 Denis =20 > On Nov 27, 2017, at 12:46 PM, Vladimir Ozerov = wrote: >=20 > Igniters, >=20 > I'd like to start a discussion about new storage format for Ignite. = Our > current approach is so-called *heap-organized* storage with secondary = index > per partition. It has a number of drawbacks: > 1) Slow scans (joins, OLAP workload) - data is writen in arbitrary = manner, > so iteration over base index leads to multiple page reads and page = locks > 2) Slow writes in case of OLTP workload- every update touches miltiple > index and free-list pages (a kind of write amplification) > 3) Duplicated PK index when SQL is enabled - our base index cannot be = used > for lookups or range scans. This makes write amplification effects = even > worse. >=20 > All mature RDBMS systems emply alternative format as default - > *index-organized* storage. In this case primary index leaf pages is = data > pages. Rowse are sorted inside data pages. This gives: > - Blazingly fast scans (no dereference, less page reads, less = evictions, > less locks) > - Fast writes in OLTP workloads when PK index column (e.g. ID) grows > monotonically (you need to *update only one page* if there are no = splits) > - Slower random writes due to index fragmentation compared to heap >=20 > I propose to adopt this approach in two phases: > 1) Optionally add data to leaf pages [1]. This should improve our = ScanQuery > dramatically > 2) Optionally has single primary index instead of per-partition index = [2]. > This should improve our updates and SQL scans at the cost of harder > rebalance and recovery. >=20 > Thoughts? >=20 > [1] https://issues.apache.org/jira/browse/IGNITE-7026 > [2] https://issues.apache.org/jira/browse/IGNITE-7027