drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-2087) Some planner option setting might produce a wrong plan and wrong query result for join query.
Date Wed, 04 Feb 2015 22:55:35 GMT

     [ https://issues.apache.org/jira/browse/DRILL-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jinfeng Ni updated DRILL-2087:
------------------------------
    Attachment: 0001-DRILL-2087-Fix-an-incorrect-join-plan-by-ensure-eith.patch

> Some planner option setting might produce a wrong plan and wrong query result for join
query.
> ---------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2087
>                 URL: https://issues.apache.org/jira/browse/DRILL-2087
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>         Attachments: 0001-DRILL-2087-Fix-an-incorrect-join-plan-by-ensure-eith.patch
>
>
> I have the following parquet data ( essentially tpch's nation/region tables, but break
into multiple parts).
> {code}
> ls -l nation
> total 24
> -rw-r--r--@ 1 jni  staff  1376 Jan 27 16:47 01.parquet
> -rw-r--r--@ 1 jni  staff  1400 Jan 27 16:47 02.parquet
> -rw-r--r--@ 1 jni  staff  1279 Jan 27 16:47 03.parquet
> ls -l region
> total 24
> -rw-r--r--@ 1 jni  staff  564 Jan 27 16:54 01.parquet
> -rw-r--r--@ 1 jni  staff  491 Jan 27 16:54 02.parquet
> -rw-r--r--@ 1 jni  staff  506 Jan 27 16:54 03.parquet
> {code}
> With the default planner setting, the following query produce the correct result :
> {code}
> 0: jdbc:drill:zk=local> select t1.n_nationkey from dfs.`/Users/jni/work/data/parquet/region`
t2 join dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = t2.r_regionkey;
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
> +-------------+
> | n_nationkey |
> +-------------+
> | 18          |
> | 12          |
> | 9           |
> | 8           |
> | 21          |
> | 19          |
> | 7           |
> | 6           |
> | 23          |
> | 22          |
> | 20          |
> | 13          |
> | 11          |
> | 10          |
> | 4           |
> | 0           |
> | 16          |
> | 15          |
> | 14          |
> | 5           |
> | 24          |
> | 17          |
> | 3           |
> | 2           |
> | 1           |
> +-------------+
> 25 rows selected (2.057 seconds)
> {code}
> However, if I modify the planner option (`planner.slice_target` and `planner.join.row_count_estimate_factor`
 ) as following, then the same query would return 0 row, which is not correct. 
> {code}
> 0: jdbc:drill:zk=local> alter session set `planner.slice_target` = 10;
> +------------+------------+
> |     ok     |  summary   |
> +------------+------------+
> | true       | planner.slice_target updated. |
> +------------+------------+
> 1 row selected (0.093 seconds)
> 0: jdbc:drill:zk=local> alter session set `planner.join.row_count_estimate_factor`
= 0.1;
> +------------+------------+
> |     ok     |  summary   |
> +------------+------------+
> | true       | planner.join.row_count_estimate_factor updated. |
> +------------+------------+
> 1 row selected (0.041 seconds)
> 0: jdbc:drill:zk=local> select t1.n_nationkey from dfs.`/Users/jni/work/data/parquet/region`
t2 join dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = t2.r_regionkey;
> +-------------+
> | n_nationkey |
> +-------------+
> +-------------+
> No rows selected (0.71 seconds)
> {code}
> If we look at explain plan result, after changing the planner option, we will see for
the JOIN operator, the LEFT does not have any EXCHANGE operator, while the RIGHT has HashToRandomExchange
operator.  That seems to be not a valid plan.
> {code}
> 00-00    Screen
> 00-01      UnionExchange
> 01-01        Project(n_nationkey=[$2])
> 01-02          HashJoin(condition=[=($1, $0)], joinType=[inner])
> 01-04            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/region]],
selectionRoot=/Users/jni/work/data/parquet/region, numFiles=1, columns=[`r_regionkey`]]])
> 01-03            HashToRandomExchange(dist0=[[$0]])
> 02-01              Project(n_regionkey=[$1], n_nationkey=[$0])
> 02-02                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/nation]],
selectionRoot=/Users/jni/work/data/parquet/nation, numFiles=1, columns=[`n_regionkey`, `n_nationkey`]]])
> {code} 
> The cause of this problem seems to be that Drill will  remove EXCHANGE operator under
some conditions.  That condition does not guarantee that a JOIN operator always either has
EXCHANGE on both sides, or none. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message