drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jinfengni <...@git.apache.org>
Subject [GitHub] drill pull request: DRILL-4531: Add a Drill customized rule for pu...
Date Fri, 25 Mar 2016 19:02:45 GMT
Github user jinfengni commented on the pull request:

    I agree that your expectation for RelSubset makes sense to me. However, for now it does
not happen that way. The following is the trace for the query which went through planning
with this customized rule (I removed some rels).
    Set#1, type: RecordType(BIGINT custkey, ANY custAddress)
    Set#2, type: (DrillRecordRow[*, l_orderkey, l_partkey, l_linenumber])
      rel#273:Subset#2.LOGICAL.ANY([]).[], best=rel#442, importance=0.31381059609000006
rowcount=100.0, cumulative cost={inf}
        rel#442:DrillScanRel.LOGICAL.ANY([]).[](table=[cp, tpch/lineitem.parquet],groupscan=ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], selectionRoot=classpath:/tpch/lineitem.parquet,
numFiles=1, usedMetadataFile=false, columns=[`*`]]), rowcount=60175.0, cumulative cost={60175.0
rows, 6.0175E8 cpu, 0.0 io, 0.0 network, 0.0 memory}
    Set#3, type: (DrillRecordRow[*, l_orderkey, l_partkey, l_linenumber])
      rel#82:Subset#3.NONE.ANY([]).[], best=null, importance=0.3874204890000001
20160101), <=($2, 20160301), OR(=($2, 1), =($2, 2), =($2, 5), =($2, 6)))), rowcount=6.25,
cumulative cost={inf}
    rel#345:LogicalFilter has a child rel#273 with LOGICAL convention.
    As another example, for the following query:
       Select n_name, n_nationkey from cp.`tpch/nation.parquet` where n_nationkey > 5
    The trace:
    Set#0, type: (DrillRecordRow[*, n_nationkey, n_name])
    Set#1, type: (DrillRecordRow[*, n_nationkey, n_name])
      rel#21:Subset#1.NONE.ANY([]).[], best=null, importance=0.81
5)), rowcount=50.0, cumulative cost={inf}
5)), rowcount=50.0, cumulative cost={inf}
      rel#35:Subset#1.LOGICAL.ANY([]).[], best=rel#60, importance=0.81
5)), rowcount=50.0, cumulative cost={125.0 rows, 250600.0 cpu, 0.0 io, 0.0 network, 0.0 memory}
    Set#2, type: RecordType(ANY n_name, ANY n_nationkey)
      rel#23:Subset#2.NONE.ANY([]).[], best=null, importance=0.9
rowcount=50.0, cumulative cost={inf}
    Again, rel#53:LogicalProject has a child rel#35 whose convention is LOGICAL.
    I think the reason that we have such mixed rels is we have different kinds of rules, used
in a single Volcano planning phase.
     1) Rule matchs base class Filter/Project, etc only.
     2) Rule matches LogicalFilter/LogicalProject, etc
     3) Rule uses copy() method to generate a new Rel 
     4) Rule  uses RelFactory to generate a new Rel.
     5) convent rule, which convert from Calcite logical (NONE/Enumerable) to Drill logical
    For instance, ProjectMergeRule, which matches base Project, yet uses default RelFactory,
will match both LogicalProject and DrillProject, but produce LogicalProject as outcome. That
will cause the mixed rels. 
    2 things we may consider to fix this:
    1) Separate the convent rules from the other transformation rules. Apply convert rule
first, then transformation rule match DrillLogical only. That's similar to what other system
(hive) is doing.
    2) go through every rule we use, and we need make sure the convention of input and ouptput
of a transformation rule should be same, except for the convert rule.
    The above 2 things would take some considerably effort, though.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message