phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-1556) Base hash versus sort merge join decision on cost
Date Fri, 09 Feb 2018 19:36:02 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358851#comment-16358851
] 

James Taylor commented on PHOENIX-1556:
---------------------------------------

Wow, this is really awesome, [~maryannxue]. I love the tests. A couple of questions:
- Should UNION_DISTINCT_FACTOR be 1.0 since we only support UNION ALL currently?
{code}
+        if (!all) {
+            rows *= UNION_DISTINCT_FACTOR;
+        }
{code}
- What's the reasoning behind stripSkipScanFilter? Is that removed because it's effect is
already incorporated into the bytes scanned estimate?
- Should RowCountVisitor have a method for distinct? In particular, there's an optimization
we have when doing a distinct on the leading PK columns which impacts cost. This optimization
is not identified until runtime, so we might need to tweak the code so we know about it at
compile time. This could be done in a separate patch.
- Somewhat orthogonal to your pull (but maybe building on top of it), do you think it'd be
possible to prevent a query from running that's "too expensive" (assuming "too expensive"
would be identified by a config property)? Something to keep in mind - I can file a separate
JIRA for this.

> Base hash versus sort merge join decision on cost
> -------------------------------------------------
>
>                 Key: PHOENIX-1556
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1556
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>            Priority: Major
>              Labels: CostBasedOptimization
>         Attachments: PHOENIX-1556.patch
>
>
> At compile time, we know how many guideposts (i.e. how many bytes) will be scanned for
the RHS table. We should, by default, base the decision of using the hash-join verus many-to-many
join on this information.
> Another criteria (as we've seen in PHOENIX-4508) is whether or not the tables being
joined are already ordered by the join key. In that case, it's better to always use the sort
merge join.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message