phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhenhua Xu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-3430) Optimizer not using all columns from secondary index
Date Tue, 01 Nov 2016 14:12:58 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhenhua Xu updated PHOENIX-3430:
--------------------------------
    Description: 
================ Setup =================
DROP TABLE IF EXISTS TEST.TEMP;

CREATE TABLE TEST.TEMP (
    ORGANIZATION_ID CHAR(15) NOT NULL,
    NETWORK_ID CHAR(15) NOT NULL,
    ENTITY_ID CHAR(15) NOT NULL,
    SCORE DOUBLE
    CONSTRAINT TOP_ENTITY_PK PRIMARY KEY (
        ORGANIZATION_ID,
        NETWORK_ID,
        ENTITY_ID
    )
) VERSIONS=1;

CREATE INDEX IF NOT EXISTS TEMP_INDEX ON TEST.TEMP (ORGANIZATION_ID, NETWORK_ID, SCORE DESC,
ENTITY_ID DESC);

EXPLAIN
SELECT entity_id, MAX(score) FROM TEST.TEMP
WHERE organization_id = 'organization_id'
  AND (network_id = 'network_id' OR network_id='network_id1')
  AND ((score = 9.0 AND entity_id < 'entity_id') OR score < 9.0)
GROUP BY entity_id
ORDER BY MAX(score) DESC, entity_id DESC
LIMIT 100;

=============== Execution Plan ===============
-CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 KEYS OVER TEST.TEMP_INDEX ['organization_id','network_id
    '] - ['organization_id','network_id1    ']
    --SERVER FILTER BY FIRST KEY ONLY AND ((TO_DOUBLE("SCORE") = 9.0 AND "ENTITY_ID" <
'entity_id') OR TO_DOUBLE("SCORE") < 9.0)
    --SERVER AGGREGATE INTO DISTINCT ROWS BY ["ENTITY_ID"]
-CLIENT MERGE SORT
-CLIENT TOP 100 ROWS SORTED BY [MAX(TO_DOUBLE("SCORE")) DESC, "ENTITY_ID" DESC]

The execution plan shows a server-side skip scans using only the first 2 columns in the TEMP_INDEX
secondary index. It could have used the SCORE and ENTITY_ID columns  to speed up server side
scans also.

  was:
================ Setup =================
DROP TABLE IF EXISTS TEST.TEMP;

CREATE TABLE TEST.TEMP (
    ORGANIZATION_ID CHAR(15) NOT NULL,
    NETWORK_ID CHAR(15) NOT NULL,
    ENTITY_ID CHAR(15) NOT NULL,
    SCORE DOUBLE
    CONSTRAINT TOP_ENTITY_PK PRIMARY KEY (
        ORGANIZATION_ID,
        NETWORK_ID,
        ENTITY_ID
    )
) VERSIONS=1;

CREATE INDEX IF NOT EXISTS TEMP_INDEX ON TEST.TEMP (ORGANIZATION_ID, NETWORK_ID, SCORE DESC,
ENTITY_ID DESC);

EXPLAIN
SELECT entity_id, MAX(score) FROM TEST.TEMP
WHERE organization_id = 'organization_id'
  AND (network_id = 'network_id' OR network_id='network_id1')
  AND ((score = 9.0 AND entity_id < 'entity_id') OR score < 9.0)
GROUP BY entity_id
ORDER BY MAX(score) DESC, entity_id DESC
LIMIT 100;

=============== Execution Plan ===============
CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 KEYS OVER TEST.TEMP_INDEX ['organization_id','network_id
    '] - ['organization_id','network_id1    ']
    SERVER FILTER BY FIRST KEY ONLY AND ((TO_DOUBLE("SCORE") = 9.0 AND "ENTITY_ID" < 'entity_id')
OR TO_DOUBLE("SCORE") < 9.0)
    SERVER AGGREGATE INTO DISTINCT ROWS BY ["ENTITY_ID"]
CLIENT MERGE SORT
CLIENT TOP 100 ROWS SORTED BY [MAX(TO_DOUBLE("SCORE")) DESC, "ENTITY_ID" DESC]

The execution plan shows a server-side skip scans using only the first 2 columns in the TEMP_INDEX
secondary index. It could have used the SCORE and ENTITY_ID columns  to speed up server side
scans also.


> Optimizer not using all columns from secondary index
> ----------------------------------------------------
>
>                 Key: PHOENIX-3430
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3430
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.8.0
>            Reporter: Zhenhua Xu
>
> ================ Setup =================
> DROP TABLE IF EXISTS TEST.TEMP;
> CREATE TABLE TEST.TEMP (
>     ORGANIZATION_ID CHAR(15) NOT NULL,
>     NETWORK_ID CHAR(15) NOT NULL,
>     ENTITY_ID CHAR(15) NOT NULL,
>     SCORE DOUBLE
>     CONSTRAINT TOP_ENTITY_PK PRIMARY KEY (
>         ORGANIZATION_ID,
>         NETWORK_ID,
>         ENTITY_ID
>     )
> ) VERSIONS=1;
> CREATE INDEX IF NOT EXISTS TEMP_INDEX ON TEST.TEMP (ORGANIZATION_ID, NETWORK_ID, SCORE
DESC, ENTITY_ID DESC);
> EXPLAIN
> SELECT entity_id, MAX(score) FROM TEST.TEMP
> WHERE organization_id = 'organization_id'
>   AND (network_id = 'network_id' OR network_id='network_id1')
>   AND ((score = 9.0 AND entity_id < 'entity_id') OR score < 9.0)
> GROUP BY entity_id
> ORDER BY MAX(score) DESC, entity_id DESC
> LIMIT 100;
> =============== Execution Plan ===============
> -CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 KEYS OVER TEST.TEMP_INDEX ['organization_id','network_id
    '] - ['organization_id','network_id1    ']
>     --SERVER FILTER BY FIRST KEY ONLY AND ((TO_DOUBLE("SCORE") = 9.0 AND "ENTITY_ID"
< 'entity_id') OR TO_DOUBLE("SCORE") < 9.0)
>     --SERVER AGGREGATE INTO DISTINCT ROWS BY ["ENTITY_ID"]
> -CLIENT MERGE SORT
> -CLIENT TOP 100 ROWS SORTED BY [MAX(TO_DOUBLE("SCORE")) DESC, "ENTITY_ID" DESC]
> The execution plan shows a server-side skip scans using only the first 2 columns in the
TEMP_INDEX secondary index. It could have used the SCORE and ENTITY_ID columns  to speed up
server side scans also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message