phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-3905) Allow dynamic filtered join queries in UPSERT SELECT to be distributed across cluster
Date Thu, 01 Jun 2017 22:39:04 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

James Taylor updated PHOENIX-3905:
----------------------------------
    Description: 
Joins on the leading part of the primary key end up doing batches of point queries (as opposed
to a broadcast hash join), and thus could be distributed across the cluster to improve performance
when used in an UPSERT SELECT. The explain plan of these indicate that a dynamic filter will
be performed like this:
{code}
DYNAMIC SERVER FILTER BY (DML.PK1 DML.PK2, DML.PK3) 
IN ((COM.PK1, COM.PK2, COM.PK3))
{code}

Currently, for these types of UPSERT SELECT queries, the selected data will flow back to the
client and then back out to the appropriate server. It'll still be parallelized, but only
on a single client as opposed to across multiple region servers in the cluster. The benefit
would depend on how many regions servers would be involved in fetching the data for the select
part of the query.

  was:
Joins on the leading part of the primary key end up doing batches of point queries (as opposed
to a broadcast hash join), and thus could be distributed across the cluster to improve performance
when used in an UPSERT SELECT. The explain plan of these indicate that a dynamic filter will
be performed like this:
{code}
DYNAMIC SERVER FILTER BY (DML.PK1 DML.PK2, DML.PK3) 
IN ((COM.PK1, COM.PK2, COM.PK3))
{code}



> Allow dynamic filtered join queries in UPSERT SELECT to be distributed across cluster
> -------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3905
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3905
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>
> Joins on the leading part of the primary key end up doing batches of point queries (as
opposed to a broadcast hash join), and thus could be distributed across the cluster to improve
performance when used in an UPSERT SELECT. The explain plan of these indicate that a dynamic
filter will be performed like this:
> {code}
> DYNAMIC SERVER FILTER BY (DML.PK1 DML.PK2, DML.PK3) 
> IN ((COM.PK1, COM.PK2, COM.PK3))
> {code}
> Currently, for these types of UPSERT SELECT queries, the selected data will flow back
to the client and then back out to the appropriate server. It'll still be parallelized, but
only on a single client as opposed to across multiple region servers in the cluster. The benefit
would depend on how many regions servers would be involved in fetching the data for the select
part of the query.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message