From dev-return-56460-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org Tue May 14 23:40:04 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 5E7FC18065D for ; Wed, 15 May 2019 01:40:04 +0200 (CEST) Received: (qmail 78773 invoked by uid 500); 14 May 2019 23:40:03 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 78735 invoked by uid 99); 14 May 2019 23:40:02 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 May 2019 23:40:02 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id D0E0AE2BFE for ; Tue, 14 May 2019 23:40:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id DB1F325817 for ; Tue, 14 May 2019 23:40:00 +0000 (UTC) Date: Tue, 14 May 2019 23:40:00 +0000 (UTC) From: "Daniel Wong (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (PHOENIX-5280) Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the edge next columns are in most leading keys MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PHOENIX-5280?page=3Dcom.atlass= ian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Wong updated PHOENIX-5280: --------------------------------- Description:=20 Provide Improvements to Scan on Composite PK where Leading Edge not fully S= pecified but the edge next columns are in most leading keys Recently a user has had an issue where they have a composite pk with 2 colu= mns say (organizationId varchar,=C2=A0departmentId varchar).=C2=A0 They wan= t to query all their data with a condition where department is fully qualif= ied department.=C2=A0 Example SELECT * FROM TABLE WHERE=C2=A0 departmentId= =3D'123'.=C2=A0 They also know that 95% of the organization leading edge co= ntains the qualified trailing edge.=C2=A0 However=C2=A0department =3D '123'= is less than 5% of the total data in the table. Based on the explain plan today for this we would run a Round Robin Full Sc= an with a filter on departmentId=3D'123'. While one possible approach to not run a full table scan is=C2=A0to build = an index on department. Another approach could be to construct a new versio= n of a skipscan like filter to control this scan.=C2=A0 Essentially we coul= d use 1 lookup to find the organizationId then=C2=A0additional skipscan for= the=C2=A0trailing=C2=A0key.=C2=A0 This could be=C2=A0triggered with a sql = syntax hint or in the future data driven. For a given region assume the data looks like this. ||organizationId||departmentId|| |org1|100| |org4|100| |org4|101| |org4|123| |org5|100| |org5|123| First query the initial row in the region.=C2=A0 We get 'org1','100'.=C2=A0= From this we can construct the next rows of ['org1','123' -=C2=A0'org1','1= 23\x0').=C2=A0 After processing that block (in our case 0 rows) we would ru= n=C2=A0to=C2=A0the row at or greater than=C2=A0 nextKey(current orgnaziatio= nId),'123'.=C2=A0 This would give us org4,101.=C2=A0 We would then run to t= he row of 'org4','123'.=C2=A0 Essentially 1 step to find the orgId and then= a scan of all the departments for that value. was: Provide Improvements to Scan on Composite PK where Leading Edge not fully S= pecified but the edge next columns are in most leading keys Recently a user has had an issue where they have a composite pk with 2 colu= mns say (organizationId varchar,=C2=A0departmentId varchar).=C2=A0 They wan= t to query all their data with a condition where department is fully qualif= ied department.=C2=A0 Example SELECT * FROM TABLE WHERE=C2=A0 departmentId= =3D'123'.=C2=A0 They also know that 95% of the organization leading edge co= ntains the qualified trailing edge.=C2=A0 However=C2=A0department =3D '123'= is less than 5% of the total data in the table. Based on the explain plan today for this we would run a Round Robin Full Sc= an with a filter on departmentId=3D'123'. While one possible approach to not run a full table scan is=C2=A0to build = an index on department. Another approach could be to construct a new versio= n of a skipscan like filter to control this scan.=C2=A0 Essentially we coul= d use 1 lookup to find the organizationId then=C2=A0additional skipscan for= the=C2=A0trailing=C2=A0key.=C2=A0 This could be=C2=A0triggered with a sql = syntax hint or in the future data driven. For a given region assume the data looks like this. ||organizationId||departmentId|| |org1|100| |org4|100| |org4|101| |org4|123| |org5|100| |org5|123| First query the initial row in the region.=C2=A0 We get 'org1','100'.=C2=A0= From this we can construct the next rows of ['org1','123' -=C2=A0'org1','1= 23\x0').=C2=A0 After proessing that block (in our case 0 rows) we would run= =C2=A0to=C2=A0the row at or greater than=C2=A0 nextKey(current orgnaziation= Id),'123'.=C2=A0 This would give us org4,101.=C2=A0 We would then run to th= e row of 'org4','123'.=C2=A0 Essentailly 1 step to find the orgId and then = a scan of all the departments for that value. > Provide Improvements to Scan on Composite PK where Leading Edge not fully= Specified but the edge next columns are in most leading keys > -------------------------------------------------------------------------= ------------------------------------------------------------- > > Key: PHOENIX-5280 > URL: https://issues.apache.org/jira/browse/PHOENIX-5280 > Project: Phoenix > Issue Type: Improvement > Reporter: Daniel Wong > Priority: Minor > > Provide Improvements to Scan on Composite PK where Leading Edge not fully= Specified but the edge next columns are in most leading keys > Recently a user has had an issue where they have a composite pk with 2 co= lumns say (organizationId varchar,=C2=A0departmentId varchar).=C2=A0 They w= ant to query all their data with a condition where department is fully qual= ified department.=C2=A0 Example SELECT * FROM TABLE WHERE=C2=A0 departmentI= d=3D'123'.=C2=A0 They also know that 95% of the organization leading edge c= ontains the qualified trailing edge.=C2=A0 However=C2=A0department =3D '123= ' is less than 5% of the total data in the table. > Based on the explain plan today for this we would run a Round Robin Full = Scan with a filter on departmentId=3D'123'. > While one possible approach to not run a full table scan is=C2=A0to buil= d an index on department. Another approach could be to construct a new vers= ion of a skipscan like filter to control this scan.=C2=A0 Essentially we co= uld use 1 lookup to find the organizationId then=C2=A0additional skipscan f= or the=C2=A0trailing=C2=A0key.=C2=A0 This could be=C2=A0triggered with a sq= l syntax hint or in the future data driven. > For a given region assume the data looks like this. > ||organizationId||departmentId|| > |org1|100| > |org4|100| > |org4|101| > |org4|123| > |org5|100| > |org5|123| > First query the initial row in the region.=C2=A0 We get 'org1','100'.=C2= =A0 From this we can construct the next rows of ['org1','123' -=C2=A0'org1'= ,'123\x0').=C2=A0 After processing that block (in our case 0 rows) we would= run=C2=A0to=C2=A0the row at or greater than=C2=A0 nextKey(current orgnazia= tionId),'123'.=C2=A0 This would give us org4,101.=C2=A0 We would then run t= o the row of 'org4','123'.=C2=A0 Essentially 1 step to find the orgId and t= hen a scan of all the departments for that value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)