From dev-return-48620-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Tue Dec 10 12:56:08 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id BCB3E180630 for ; Tue, 10 Dec 2019 13:56:07 +0100 (CET) Received: (qmail 46651 invoked by uid 500); 10 Dec 2019 12:56:07 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 46639 invoked by uid 99); 10 Dec 2019 12:56:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Dec 2019 12:56:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 120B21A1861 for ; Tue, 10 Dec 2019 12:56:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.551 X-Spam-Level: X-Spam-Status: No, score=-0.551 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URI_HEX=0.1] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=mail.ru Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id vbw8Topj5KBP for ; Tue, 10 Dec 2019 12:56:01 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=94.100.177.105; helo=smtp45.i.mail.ru; envelope-from=kondakov87@mail.ru; receiver= Received: from smtp45.i.mail.ru (smtp45.i.mail.ru [94.100.177.105]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 83C9ABC509 for ; Tue, 10 Dec 2019 12:56:01 +0000 (UTC) X-ASF-DKIM-Sig: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=mail.ru; s=mail2; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Date:Message-ID:To:Subject:From; bh=5w7lHjqf93Sj8VbjCkxBdSwj497nyAWYTYJKXWP+y00=; b=uLE7yq3vx0MURCvNlOPCtqSFdk0UWckjG8e9K7YwgLh/UZV7Nunrt1kttEhPDg/5/Xlekv645r1yAByMbrSRlbBtl33AlR8KbIazHuCZpk/d2MLh5xuxqrvbkZE+D0thPPQAmeA6AMch6xPrq3h3Su+uvTt7QawQLzLH69rrkg4=; Received: by smtp45.i.mail.ru with esmtpa (envelope-from ) id 1ief3R-0003D8-5A for dev@ignite.apache.org; Tue, 10 Dec 2019 15:55:53 +0300 From: Roman Kondakov Subject: Adding support for Ignite secondary indexes to Apache Calcite planner To: dev@ignite.apache.org Message-ID: <9c27ccd1-04c7-4721-b651-439ff4eebe76@mail.ru> Date: Tue, 10 Dec 2019 15:55:52 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Authentication-Results: smtp45.i.mail.ru; auth=pass smtp.auth=kondakov87@mail.ru smtp.mailfrom=kondakov87@mail.ru X-77F55803: 0A44E481635329DB0E1AA8A03B392317179C3E6C7981FFF22E414E05DFD0F911B62FD000FE3E8B1BF688BCB05C26794D735BAB88EB1AB25578DA64B004A1C29D3D15B59C29B0196B188E28E32518D109CEBD7663CDD6CE63 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE78952527CC63A935BEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006376FC24771456F8BB48638F802B75D45FF5571747095F342E8C7A0BC55FA0FE5FC84CE13365E86C620525DAB92F552BA0521D03AFB2531888A389733CBF5DBD5E913377AFFFEAFD269A417C69337E82CC2CC7F00164DA146DAFE8445B8C89999725571747095F342E8C26CFBAC0749D213D2E47CDBA5A9658359CC434672EE6371117882F4460429728AD0CFFFB425014E40A5AABA2AD371193AA81AA40904B5D9A18204E546F3947CE6D7C9137AE18D266136E347CC761E074AD6D5ED66289B52E1A3F18E62937ED6A8C29749A63C2581725E5C173C3A84C376223EE1711880FBBA3038C0950A5D36B5C8C57E37DE458B5A02E28F51ECAD4422CA9DD8327EE4930A3850AC1BE2E735C9AFE1E703BC9514BF002750C4F20CAE731C566533BA786A40A5AABA2AD371193C9F3DD0FB1AF5EBDFC194086D65061027F269C8F02392CD5571747095F342E88FB05168BE4CE3AF X-Mailru-Sender: 020BB76CEAF69E22162117EFD3EE70F96246E4021005775E54B0EE117C781696A40125B5669B6FA0880A9ADA9F10316EBC7555A253F5B20052F876A85A48D3A9EEA116A7084406B00D4ABDE8C577C2ED X-Mras: OK Hi all! As you may know there is an activity on integration of Apache Calcite query optimizer into Ignite codebase is being carried out [1],[2]. One of a bunch of problems in this integration is the absence of out-of-the-box support for secondary indexes in Apache Calcite. After some research I came to conclusion that this problem has a couple of workarounds. Let's name them 1. Phoenix-style approach - representing secondary indexes as materialized views which are natively supported by Calcite engine [3] 2. Drill-style approach - pushing filters into the table scans and choose appropriate index for lookups when possible [4] Both these approaches have advantages and disadvantages: Phoenix style pros: - natural way of adding indexes as an alternative source of rows: index can be considered as a kind of sorted materialized view. - possibility of using index sortedness for stream aggregates, deduplication (DISTINCT operator), merge joins, etc. - ability to support other types of indexes (i.e. functional indexes). Phoenix style cons: - polluting optimizer's search space extra table scans hence increasing the planning time. Drill style pros: - easier to implement (although it's questionable). - search space is not inflated. Drill style cons: - missed opportunity to exploit sortedness. There is a good discussion about using both approaches can be found in [5]. I made a small sketch [6] in order to demonstrate the applicability of the Phoenix approach to Ignite. Key design concepts are: 1. On creating indexes are registered as tables in Calcite schema. This step is needed for internal Calcite's routines. 2. On planner initialization we register these indexes as materialized views in Calcite's optimizer using VolcanoPlanner#addMaterialization method. 3. Right before the query execution Calcite selects all materialized views (indexes) which can be potentially used in query. 4. During the query optimization indexes are registered by planner as usual TableScans and hence can be chosen by optimizer if they have lower cost. This sketch shows the ability to exploit index sortedness only. So the future work in this direction should be focused on using indexes for fast index lookups. At first glance FilterableTable and FilterTableScanRule are good points to start. We can push Filter into the TableScan and then use FilterableTable for fast index lookups avoiding reading the whole index on TableScan step and then filtering its output on the Filter step. What do you think? [1] http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none [2] https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine [3] https://issues.apache.org/jira/browse/PHOENIX-2047 [4] https://issues.apache.org/jira/browse/DRILL-6381 [5] https://issues.apache.org/jira/browse/DRILL-3929 [6] https://github.com/apache/ignite/pull/7115 -- Kind Regards Roman Kondakov