calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: Cluster mismatch between RelNodes of a query and a materialized view
Date Wed, 08 Mar 2017 19:40:07 GMT
> On Mar 8, 2017, at 11:27 AM, Ashutosh Chauhan <hashutosh@apache.org> wrote:
> 
> Fact that
> RelFactories exist is evidence enough that LogicalRelNodes are not
> sufficient outside of Calcite.

Not true.

Hive uses its own RelNodes far more than other projects do. The other projects do a significant
part of their planning in the logical domain, and only later move to the physical domain.
They use RelFactories so that they can re-use rules when they are in the physical domain.

Materialized view substitution is a logical operation and it makes sense to do it in the logical
domain.

I don’t recall the details of the Hive RelNodes but isn’t it true that in most cases they
have the same information as the corresponding Logical RelNode?

We can find a solution to this problem for Hive RelNodes, but I still feel that Hive overuses
Hive RelNodes and that causes inefficiencies for both Hive and Calcite.

Julian




> On Mar 8, 2017, at 11:27 AM, Ashutosh Chauhan <hashutosh@apache.org> wrote:
> 
> Working on LogicalRelNodes doesnt solve problem for Hive (or any other
> project which use RelFactories to generate custom rel node). Fact that
> RelFactories exist is evidence enough that LogicalRelNodes are not
> sufficient outside of Calcite. We have to solve this problem in that
> context.
> 
> On Wed, Mar 8, 2017 at 10:50 AM, Remus Rusanu <rrusanu@hortonworks.com>
> wrote:
> 
>> To make sure I understand, you’re saying that the cloning would take a
>> tree of various Xxx operators (eg. HiveProject) and generate a tree of
>> equivalent LogicalXxx  operators (HiveProject -> LogicalProject,
>> HiveFilter->LogicalFilter, HiveAggregate ->  LogicalAggregate and so on).
>> Is my understanding correct? Ignoring Hive issues, wouldn’t that loose
>> information?
>> For Hive specific case, the HiveXxx operator extend Xxx not LogicalXxx
>> (not sure yet if this is an issue or not) but for sure they carry lots of
>> extra info that would be lost if the clone would contain LogicalXxx instead
>> of HiveXxx.
>> 
>> Thanks,
>> ~Remus
>> 
>> On 3/8/17, 9:45 AM, "Julian Hyde" <jhyde@apache.org> wrote:
>> 
>>    Could you work on LogicalXxx rather than HiveXxx? I know the Hive team
>> likes to do everything in terms of HiveXxx RelNodes but I’m not sure it has
>> to be that way.
>> 
>>    Julian
>> 
>>> On Mar 8, 2017, at 9:37 AM, Remus Rusanu <rrusanu@hortonworks.com>
>> wrote:
>>> 
>>> Agree on the RelOptCluster.
>>> 
>>> I noticed how adding new method to RelNode yields a gargantuan task
>> (from the list of compile errors I got…). But I’m not sure a RelShuttle can
>> handle this. For my test case the 3 nodes that need to be cloned are
>> HiveScanTable, HiveFilter and HiveProject, all declared in Hive and not
>> even extending AbstractRelNode but directly base RelNode. A RelShuttle
>> wouldn’t know about these types, and wouldn’t be able to create them (short
>> of using reflection and adhering to a strict constructor signature, which I
>> think is much too fragile). What I did to get the ball rolling I added a
>> default implementation in AbstractRelNode (basically assert ‘must be
>> implemented by subclass’), this allowed me to test easily and, as a proof
>> of concept, I have it working. But I reckon is fragile test wise, and new
>> RelNode types wouldn’t know about the requirement to provide a copyTo
>> implementation.
>>> 
>>> Thanks,
>>> ~Remus
>>> 
>>> On 3/8/17, 9:05 AM, "Julian Hyde" <jhyde@apache.org> wrote:
>>> 
>>>   The argument should be a RelOptCluster, not a RelOptPlanner. The
>> link from RelNode to planner is indirect currently (via cluster) and will
>> be non-existent after CALCITE-1536.
>>> 
>>>   I question whether we need a new method. Putting an abstract
>> method on RelNode is a huge burden because every RelNode sub-class needs to
>> be fixed when people upgrade. Even a non-abstract method imposes a
>> conceptual burden: more methods to understand.
>>> 
>>>   So, my approach would be to sub-class RelShuttle. It’s sufficient
>> that it only works for LogicalXxx nodes.
>>> 
>>>   No need to copy RexNode expressions. They are immutable.
>>> 
>>>   Julian
>>> 
>>> 
>>>> On Mar 8, 2017, at 4:14 AM, Remus Rusanu <rrusanu@hortonworks.com>
>> wrote:
>>>> 
>>>> I created CALCITE-1681 https://issues.apache.org/
>> jira/browse/CALCITE-1681 and I intent to work on it for finishing
>> HIVE-15708.
>>>> My current thinking is to create a RelCopier based on RelShuttle
>> and add a new abstract RelNode.copyTo(RelOptPlanner) that each concrete Rel
>> type must override. The Rex part is already handled by the existing
>> RexCopier.
>>>> 
>>>> Thanks,
>>>> ~Remus
>>>> 
>>>> On 3/6/17, 12:30 PM, "Julian Hyde" <jhyde@apache.org> wrote:
>>>> 
>>>>  Every RelNode belongs to a RelOptCluster, and basically there is
>> one RelOptCluster created each time a query is prepared. When working with
>> materialized views, the view’s query is represented as a tree of RelNodes,
>> that tree is used for optimizing more than one query. When planning a
>> particular query, the nodes of that query will have a different
>> RelOptCluster than the nodes of the materialized view(s) they are matched
>> against.
>>>> 
>>>>  How do we deal with this? Do we copy the nodes into the query’s
>> cluster once we have found a match? If so, how? I couldn’t find a sub-class
>> of RelVisitor or RelShuttle that copies trees to a different RelOptCluster.
>>>> 
>>>>  By the way, https://issues.apache.org/jira/browse/CALCITE-1536 <
>> https://issues.apache.org/jira/browse/CALCITE-1536> aims to improve the
>> RelNode life-cycle but I don’t think it will solve this problem.
>>>> 
>>>>  Julian
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>> 


Mime
View raw message