phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Palmert (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PHOENIX-3438) Phoenix should be able to optimize LIMUT query with IN clause
Date Thu, 03 Nov 2016 16:34:59 GMT
Joel Palmert created PHOENIX-3438:
-------------------------------------

             Summary: Phoenix should be able to optimize LIMUT query with IN clause
                 Key: PHOENIX-3438
                 URL: https://issues.apache.org/jira/browse/PHOENIX-3438
             Project: Phoenix
          Issue Type: Bug
    Affects Versions: 4.8.0
            Reporter: Joel Palmert


This is taken from a real production issue we're having but has been simplified here. Steps
to repro:

CREATE TABLE IF NOT EXISTS TEST.TEST (
    ORGANIZATION_ID CHAR(15) NOT NULL,
    SCORE INTEGER NOT NULL,
    ENTITY_ID CHAR(15)
    CONSTRAINT TEST_PK PRIMARY KEY (
        ORGANIZATION_ID,
        SCORE DESC
    )
) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=43200

EXPLAIN
SELECT entity_id, MAX(score) max_score
FROM test.test
WHERE organization_id IN ('org1','org3')
GROUP BY entity_id
ORDER BY max_score DESC
LIMIT 1

CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 KEYS OVER TEST.TEST ['org1           '] - ['org2
          ']
    SERVER AGGREGATE INTO DISTINCT ROWS BY [ENTITY_ID]
CLIENT MERGE SORT
CLIENT TOP 1 ROW SORTED BY [MAX(SCORE) DESC]

Notice that Phoenix gets and sends all the rows for the two organizations to the client. Since
the ORDER BY is in PK order I would expect it to just 
1. Get the first row for each organization (in index order) rather than scanning all the rows.
2. Merge them (only 2 rows in this case)
3. Limit 1

Consider the following query and plan for what I would expect to see:
EXPLAIN
SELECT entity_id, MAX(score) max_score
FROM(
	SELECT entity_id, score
	FROM(
		SELECT entity_id, score
		FROM test.test
		WHERE organization_id = 'org1'
		ORDER BY score DESC
		LIMIT 1)
	UNION ALL
	SELECT entity_id, score
		FROM(
		SELECT entity_id, score
		FROM test.test
		WHERE organization_id = 'org2'
		ORDER BY score DESC
		LIMIT 1))
GROUP BY entity_id
ORDER BY max_score DESC
LIMIT 1

UNION ALL OVER 2 QUERIES
    CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER TEST.TEST ['org1           ']
        SERVER 1 ROW LIMIT
    CLIENT 1 ROW LIMIT
    CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER TEST.TEST ['org2           ']
        SERVER 1 ROW LIMIT
    CLIENT 1 ROW LIMIT
CLIENT SORTED BY [ENTITY_ID]
CLIENT AGGREGATE INTO DISTINCT ROWS BY [ENTITY_ID]
CLIENT TOP 1 ROW SORTED BY [MAX(SCORE) DESC]






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message