calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: Join affinity
Date Fri, 24 Jul 2015 23:31:31 GMT
Much as we would like to do without them, hints are inevitable (like death and taxes). Some
users will never be happy unless you give them a hints mechanism.

So, add your requirements to https://issues.apache.org/jira/browse/CALCITE-482. 

> On Jul 24, 2015, at 1:33 PM, Aman Sinha <asinha@maprtech.com> wrote:
> 
> Yes, smarter costing + good statistics on column NDV would achieve the
> purpose but we do know that (a) column stats may not be available and (b)
> even with stats the desired join ordering may not happen.  Either some type
> of optimizer hint or heuristic built into the optimizer would be useful.
> 
> Aman
> 
> On Fri, Jul 24, 2015 at 12:13 PM, Jacques Nadeau <jacques@apache.org> wrote:
> 
>> Why wouldn't costing alone solve this?  It seems like the Join operator
>> could understand these types of things enough to have a smarter costing
>> function.
>> 
>> 
>> On Fri, Jul 24, 2015 at 11:47 AM, Aman Sinha <asinha@maprtech.com> wrote:
>> 
>>> In many situations we want to  have some kind of join 'affinity' for a
>> pair
>>> of tables T1, T2  where these 2 tables must be joined to each other
>> before
>>> being joined to any other table.
>>> 
>>> Two such situations are:
>>> 1.   SELECT * FROM t1, t2 WHERE t1.a1 IN (1, 2, 3.....thousands) AND
>> t1.b1
>>> = t2.b2
>>>      Here, the IN list can be represented by a VALUES operator and
>> exposed
>>> as a table
>>>      to be joined to t1.  It is important to do this join first before
>>> joining t1 to t2.
>>>           - This might be addressed by doing a SemiJoin instead of inner
>>> join.  Would that
>>>              be the right approach ?
>>> 
>>> 2.  SELECT * FROM t1,  t1_dimension, t2 WHERE  ....
>>>     Here, I want to join t1 and t1_dimension before joining t1 with t2.
>>> The reason is
>>>     t1_dimension has a column that is the primary key of t1, so if there
>>> is a filter on a
>>>     dimension column, I can get the corresponding primary keys and do
>> the
>>> join with t1.
>>>     One can think of t1_dimension as basically an index table for t1.
>>>        -  For this case, I don't want to rely on costing since the join
>>> planning may not
>>>           necessarily pick the join order I want.   Is there a good way
>> to
>>> achieve this in
>>>           Calcite ?
>>> 
>>> thanks,
>>> Aman
>>> 
>> 


Mime
View raw message