phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-1556) Base hash versus sort merge join decision on cost
Date Fri, 09 Feb 2018 22:41:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359066#comment-16359066
] 

James Taylor commented on PHOENIX-1556:
---------------------------------------

bq. Yes. stripSkipScanFilter() also aims to eliminate things like PageFilter and looks to
keep only boolean expression filters that cannot be pushed into PK.
One thing with PageFilter is that it represents the limit pushed down to the server. Since
the limit cannot always be pushed down (depending on the query - for example an aggregate
query can push down the limit only if it's aggregating on the leading part of the pk), should
we consider that? Or do you think we can reliably get the limit that's pushed to the server
from the query plan?

bq. A probably more realistic approach here might be to set a configurable "limit" for specific
operators
That's a good idea. I'll file a JIRA and copy/paste your explanation there.

+1 to the patch (assuming tests pass locally for you -- FYI test with the 4.x-HBase-1.3 branch
as there are test failures in master). Great work!

> Base hash versus sort merge join decision on cost
> -------------------------------------------------
>
>                 Key: PHOENIX-1556
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1556
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>            Priority: Major
>              Labels: CostBasedOptimization
>         Attachments: PHOENIX-1556.patch
>
>
> At compile time, we know how many guideposts (i.e. how many bytes) will be scanned for
the RHS table. We should, by default, base the decision of using the hash-join verus many-to-many
join on this information.
> Another criteria (as we've seen in PHOENIX-4508) is whether or not the tables being
joined are already ordered by the join key. In that case, it's better to always use the sort
merge join.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message