phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Wong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PHOENIX-5280) Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the edge next columns are in most leading keys
Date Tue, 14 May 2019 21:14:00 GMT
Daniel Wong created PHOENIX-5280:
------------------------------------

             Summary: Provide Improvements to Scan on Composite PK where Leading Edge not
fully Specified but the edge next columns are in most leading keys
                 Key: PHOENIX-5280
                 URL: https://issues.apache.org/jira/browse/PHOENIX-5280
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Daniel Wong


Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the
edge next columns are in most leading keys

Recently a user has had an issue where they have a composite pk with 2 columns say (organizationId
varchar, departmentId varchar).  They want to query all their data with a condition where
department is fully qualified department.  Example SELECT * FROM TABLE WHERE  departmentId='123'. 
They also know that 95% of the organization leading edge contains the qualified trailing edge. 
However department = '123' is less than 5% of the total data in the table.

Based on the explain plan today for this we would run a Round Robin Full Scan with a filter
on departmentId='123'.
While one possible approach to not run a full table scan is to build an index on department.
Another approach could be to construct a new version of a skipscan like filter to control
this scan.  Essentially we could use 1 lookup to find the organizaitonId then additoin skipscan
for the trailing key.

For a given region assume the data looks like this.
||organizationId||departmentId||
|org1|100|
|org4|100|
|org4|101|
|org4|123|
|org5|100|
|org5|123|


First query the initial row in the region.  We get 'org1','100'.  From this we can construct
the next rows of ['org1','123' - 'org1','123\x0').  After proessing that block (in our case
0 rows) we would run to the row at or greater than  nextKey(current orgnaziationId),'123'. 
This would give us org4,101.  We would then run to the row of 'org4','123'.  Essentailly
1 step to find the orgId and then a scan of all the departments for that value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message