phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maryann Xue (JIRA)" <>
Subject [jira] [Comment Edited] (PHOENIX-1556) Base hash versus sort merge join decision on cost
Date Mon, 05 Feb 2018 08:32:00 GMT


Maryann Xue edited comment on PHOENIX-1556 at 2/5/18 8:31 AM:

Could you please review the patch for me, [~jamestaylor]?

The {{CostBasedDecisionIT#testJoinStrategyXXX()}} tests are to verify and demonstrate how
join strategies are chosen in different scenarios.

The calculation of join costs is designed to follow the guidelines below:
 # The build side is applied with more weight than the probe side in calculating the cost
of a hash-join, so the smaller table is guaranteed to be the build side.
 # If the build side exceeds the size limit (QueryServices.MAX_SERVER_CACHE_SIZE_ATTRIB),
the cost of hash-join is infinitely large.
 # The cost of sort-merge-join alone is guaranteed to be smaller than hash-join. But when
sorting is required to do a sort-merge-join, the cost of the added "order-by" will be counted
in the total cost and thus may make the sort-merge-join operation as a whole more expensive.
 # The {{QueryCompiler#compileJoinQuery()}} method will compare the costs of the plans from
all applicable join strategies and chose a local optimal. As a result, with multiple joins
(joins between more than two tables), a mix of join strategies may be chosen as the final
plan. For example, (A hash-join B) sort-merge-join C.

was (Author: maryannxue):
Could you please review the patch for me, [~jamestaylor]?

> Base hash versus sort merge join decision on cost
> -------------------------------------------------
>                 Key: PHOENIX-1556
>                 URL:
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>            Priority: Major
>              Labels: CostBasedOptimization
>         Attachments: PHOENIX-1556.patch
> At compile time, we know how many guideposts (i.e. how many bytes) will be scanned for
the RHS table. We should, by default, base the decision of using the hash-join verus many-to-many
join on this information.
> Another criteria (as we've seen in PHOENIX-4508) is whether or not the tables being
joined are already ordered by the join key. In that case, it's better to always use the sort
merge join.

This message was sent by Atlassian JIRA

View raw message