Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2D4B4200BB4 for ; Tue, 1 Nov 2016 15:13:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 2B8CE160B07; Tue, 1 Nov 2016 14:13:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7391E160AE5 for ; Tue, 1 Nov 2016 15:12:59 +0100 (CET) Received: (qmail 47149 invoked by uid 500); 1 Nov 2016 14:12:58 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 47123 invoked by uid 99); 1 Nov 2016 14:12:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Nov 2016 14:12:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6F5472C1F56 for ; Tue, 1 Nov 2016 14:12:58 +0000 (UTC) Date: Tue, 1 Nov 2016 14:12:58 +0000 (UTC) From: "Zhenhua Xu (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (PHOENIX-3430) Optimizer not using all columns from secondary index MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 01 Nov 2016 14:13:00 -0000 [ https://issues.apache.org/jira/browse/PHOENIX-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Xu updated PHOENIX-3430: -------------------------------- Description: ================ Setup ================= DROP TABLE IF EXISTS TEST.TEMP; CREATE TABLE TEST.TEMP ( ORGANIZATION_ID CHAR(15) NOT NULL, NETWORK_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE CONSTRAINT TOP_ENTITY_PK PRIMARY KEY ( ORGANIZATION_ID, NETWORK_ID, ENTITY_ID ) ) VERSIONS=1; CREATE INDEX IF NOT EXISTS TEMP_INDEX ON TEST.TEMP (ORGANIZATION_ID, NETWORK_ID, SCORE DESC, ENTITY_ID DESC); EXPLAIN SELECT entity_id, MAX(score) FROM TEST.TEMP WHERE organization_id = 'organization_id' AND (network_id = 'network_id' OR network_id='network_id1') AND ((score = 9.0 AND entity_id < 'entity_id') OR score < 9.0) GROUP BY entity_id ORDER BY MAX(score) DESC, entity_id DESC LIMIT 100; =============== Execution Plan =============== -CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 KEYS OVER TEST.TEMP_INDEX ['organization_id','network_id '] - ['organization_id','network_id1 '] --SERVER FILTER BY FIRST KEY ONLY AND ((TO_DOUBLE("SCORE") = 9.0 AND "ENTITY_ID" < 'entity_id') OR TO_DOUBLE("SCORE") < 9.0) --SERVER AGGREGATE INTO DISTINCT ROWS BY ["ENTITY_ID"] -CLIENT MERGE SORT -CLIENT TOP 100 ROWS SORTED BY [MAX(TO_DOUBLE("SCORE")) DESC, "ENTITY_ID" DESC] The execution plan shows a server-side skip scans using only the first 2 columns in the TEMP_INDEX secondary index. It could have used the SCORE and ENTITY_ID columns to speed up server side scans also. was: ================ Setup ================= DROP TABLE IF EXISTS TEST.TEMP; CREATE TABLE TEST.TEMP ( ORGANIZATION_ID CHAR(15) NOT NULL, NETWORK_ID CHAR(15) NOT NULL, ENTITY_ID CHAR(15) NOT NULL, SCORE DOUBLE CONSTRAINT TOP_ENTITY_PK PRIMARY KEY ( ORGANIZATION_ID, NETWORK_ID, ENTITY_ID ) ) VERSIONS=1; CREATE INDEX IF NOT EXISTS TEMP_INDEX ON TEST.TEMP (ORGANIZATION_ID, NETWORK_ID, SCORE DESC, ENTITY_ID DESC); EXPLAIN SELECT entity_id, MAX(score) FROM TEST.TEMP WHERE organization_id = 'organization_id' AND (network_id = 'network_id' OR network_id='network_id1') AND ((score = 9.0 AND entity_id < 'entity_id') OR score < 9.0) GROUP BY entity_id ORDER BY MAX(score) DESC, entity_id DESC LIMIT 100; =============== Execution Plan =============== CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 KEYS OVER TEST.TEMP_INDEX ['organization_id','network_id '] - ['organization_id','network_id1 '] SERVER FILTER BY FIRST KEY ONLY AND ((TO_DOUBLE("SCORE") = 9.0 AND "ENTITY_ID" < 'entity_id') OR TO_DOUBLE("SCORE") < 9.0) SERVER AGGREGATE INTO DISTINCT ROWS BY ["ENTITY_ID"] CLIENT MERGE SORT CLIENT TOP 100 ROWS SORTED BY [MAX(TO_DOUBLE("SCORE")) DESC, "ENTITY_ID" DESC] The execution plan shows a server-side skip scans using only the first 2 columns in the TEMP_INDEX secondary index. It could have used the SCORE and ENTITY_ID columns to speed up server side scans also. > Optimizer not using all columns from secondary index > ---------------------------------------------------- > > Key: PHOENIX-3430 > URL: https://issues.apache.org/jira/browse/PHOENIX-3430 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.8.0 > Reporter: Zhenhua Xu > > ================ Setup ================= > DROP TABLE IF EXISTS TEST.TEMP; > CREATE TABLE TEST.TEMP ( > ORGANIZATION_ID CHAR(15) NOT NULL, > NETWORK_ID CHAR(15) NOT NULL, > ENTITY_ID CHAR(15) NOT NULL, > SCORE DOUBLE > CONSTRAINT TOP_ENTITY_PK PRIMARY KEY ( > ORGANIZATION_ID, > NETWORK_ID, > ENTITY_ID > ) > ) VERSIONS=1; > CREATE INDEX IF NOT EXISTS TEMP_INDEX ON TEST.TEMP (ORGANIZATION_ID, NETWORK_ID, SCORE DESC, ENTITY_ID DESC); > EXPLAIN > SELECT entity_id, MAX(score) FROM TEST.TEMP > WHERE organization_id = 'organization_id' > AND (network_id = 'network_id' OR network_id='network_id1') > AND ((score = 9.0 AND entity_id < 'entity_id') OR score < 9.0) > GROUP BY entity_id > ORDER BY MAX(score) DESC, entity_id DESC > LIMIT 100; > =============== Execution Plan =============== > -CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 KEYS OVER TEST.TEMP_INDEX ['organization_id','network_id '] - ['organization_id','network_id1 '] > --SERVER FILTER BY FIRST KEY ONLY AND ((TO_DOUBLE("SCORE") = 9.0 AND "ENTITY_ID" < 'entity_id') OR TO_DOUBLE("SCORE") < 9.0) > --SERVER AGGREGATE INTO DISTINCT ROWS BY ["ENTITY_ID"] > -CLIENT MERGE SORT > -CLIENT TOP 100 ROWS SORTED BY [MAX(TO_DOUBLE("SCORE")) DESC, "ENTITY_ID" DESC] > The execution plan shows a server-side skip scans using only the first 2 columns in the TEMP_INDEX secondary index. It could have used the SCORE and ENTITY_ID columns to speed up server side scans also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)