Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9A2B4200BDB for ; Fri, 4 Nov 2016 00:39:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 98EFE160AFF; Thu, 3 Nov 2016 23:39:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E0587160B0B for ; Fri, 4 Nov 2016 00:38:59 +0100 (CET) Received: (qmail 82320 invoked by uid 500); 3 Nov 2016 23:38:58 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 82110 invoked by uid 99); 3 Nov 2016 23:38:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Nov 2016 23:38:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 605572C0D55 for ; Thu, 3 Nov 2016 23:38:58 +0000 (UTC) Date: Thu, 3 Nov 2016 23:38:58 +0000 (UTC) From: "Samarth Jain (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PHOENIX-3439) Query using an RVC based on the base table PK is incorrectly using an index and doing a full scan instead of a point query MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 03 Nov 2016 23:39:00 -0000 [ https://issues.apache.org/jira/browse/PHOENIX-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634669#comment-15634669 ] Samarth Jain commented on PHOENIX-3439: --------------------------------------- +1 > Query using an RVC based on the base table PK is incorrectly using an index and doing a full scan instead of a point query > -------------------------------------------------------------------------------------------------------------------------- > > Key: PHOENIX-3439 > URL: https://issues.apache.org/jira/browse/PHOENIX-3439 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.8.1 > Reporter: Jan Fernando > Attachments: PHOENIX-3439.patch > > > We use Phoenix RVCs to support paginated queries. This performance of this functionality relies on Phoenix predictably generating scans against a table or index with a PK that matches the RVC specified for each page. > What we do is that on the initial query we use PhoenixRuntime.getPkColsDataTypesForSql() to get the list of PK columns and persist that and use those to generate RVCs for paginated queries. > We have discovered that for queries where: > a) the user doesn't specify an ORDER BY > b) for tables where secondary indexes are present > Phoenix returns pk cols for the base table via getPkColsDataTypesForSql() but then subsequent queries using the RVCs to paginate execute against a secondary index doing a full scan. > We have a table with a secondary index where this is an issue. The base table has a PK of PKCOL1, PKCOL2, PKCOL3 and > PKCOL4. We have an immutable secondary index where the PK is PKCOL1, PKCOL3, PKCOL2, PKCOL4. > Here's what happens: > Here is our query we run to get the Query plan from which we generate the RVCs to be used for paging: > EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2 > FROM MY_TABLES."MYTABLE" > LIMIT 501; > I get the following explain: > CLIENT 6-CHUNK SERIAL 1-WAY ROUND ROBIN RANGE SCAN OVER MY_TABLES.MY_TABLE ['00Dxx0000001gFA'] > SERVER 501 ROW LIMIT > CLIENT 501 ROW LIMIT > Therefore the columns we record for RVCs for paging are PK1, PK2, PK3, PK4 from MY_TABLES.MY_TABLE > However when I generate the RVC query to page through the data: > EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2 > FROM MY_TABLES."MYTABLE" > (pkcol1, pkcol2, pkcol3,pkcol4) > ('001','001xx000003DHml',to_date('2015-10-21 09 (tel:2015102109):50:55.0'),'017xx0000022FuI') > LIMIT 501; > I get the follow explain plan: > CLIENT 24-CHUNK 7811766 ROWS 6291457403 BYTES PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER MY_TABLES.MY_SECONDARY_INDEX ['00Dxx0000001gFA','001'] - ['00Dxx0000001gFA',*] > SERVER FILTER BY ("PKCOL1", "PKCOL2, "PKCOL3", "PKCOL4") > (TO_CHAR('001'), TO_CHAR('001xx000003DHml'), DATE '2015-10-21 09 (tel:2015102109):50:55.000', TO_CHAR('017xx0000022FuI')) > SERVER 501 ROW LIMIT > CLIENT 501 ROW LIMIT > We expected that the second query with RVCs above would execute against the base table as the base table PK is PKCOL1, PKCOL2, PKCOL3, PKCOL4 and the index PK is PKCOL1, PKCOL3, PKCOL2, PKCOL4. -- This message was sent by Atlassian JIRA (v6.3.4#6332)