phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3438) Phoenix should be able to optimize LIMIT query with IN clause
Date Mon, 07 Nov 2016 18:03:59 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15644889#comment-15644889
] 

James Taylor commented on PHOENIX-3438:
---------------------------------------

Make sure to push down the limit to the aggregation:
{code}
SELECT DISTINCT entity_id, score
FROM test.test
WHERE organization_id IN ('org1','org3')
ORDER BY max_score DESC
LIMIT 3
{code}

> Phoenix should be able to optimize LIMIT query with IN clause
> -------------------------------------------------------------
>
>                 Key: PHOENIX-3438
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3438
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.8.0
>            Reporter: Joel Palmert
>            Assignee: James Taylor
>            Priority: Critical
>             Fix For: 4.8.2
>
>
> This is taken from a real production issue we're having but has been simplified here.
Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
>     ORGANIZATION_ID CHAR(15) NOT NULL,
>     SCORE INTEGER NOT NULL,
>     ENTITY_ID CHAR(15)
>     CONSTRAINT TEST_PK PRIMARY KEY (
>         ORGANIZATION_ID,
>         SCORE DESC
>     )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=43200
> EXPLAIN
> SELECT entity_id, MAX(score) max_score
> FROM test.test
> WHERE organization_id IN ('org1','org3')
> GROUP BY entity_id
> ORDER BY max_score DESC
> LIMIT 1
> CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 KEYS OVER TEST.TEST ['org1           ']
- ['org2           ']
>     SERVER AGGREGATE INTO DISTINCT ROWS BY [ENTITY_ID]
> CLIENT MERGE SORT
> CLIENT TOP 1 ROW SORTED BY [MAX(SCORE) DESC]
> Notice that Phoenix gets and sends all the rows for the two organizations to the client.
Since the ORDER BY is in PK order I would expect it to just 
> 1. Get the first row for each organization (in index order) rather than scanning all
the rows.
> 2. Merge them (only 2 rows in this case)
> 3. Limit 1
> Consider the following query and plan for what I would expect to see:
> EXPLAIN
> SELECT entity_id, MAX(score) max_score
> FROM(
> 	SELECT entity_id, score
> 	FROM(
> 		SELECT entity_id, score
> 		FROM test.test
> 		WHERE organization_id = 'org1'
> 		ORDER BY score DESC
> 		LIMIT 1)
> 	UNION ALL
> 	SELECT entity_id, score
> 		FROM(
> 		SELECT entity_id, score
> 		FROM test.test
> 		WHERE organization_id = 'org2'
> 		ORDER BY score DESC
> 		LIMIT 1))
> GROUP BY entity_id
> ORDER BY max_score DESC
> LIMIT 1
> UNION ALL OVER 2 QUERIES
>     CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER TEST.TEST ['org1           ']
>         SERVER 1 ROW LIMIT
>     CLIENT 1 ROW LIMIT
>     CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER TEST.TEST ['org2           ']
>         SERVER 1 ROW LIMIT
>     CLIENT 1 ROW LIMIT
> CLIENT SORTED BY [ENTITY_ID]
> CLIENT AGGREGATE INTO DISTINCT ROWS BY [ENTITY_ID]
> CLIENT TOP 1 ROW SORTED BY [MAX(SCORE) DESC]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message