drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3196) Disable multiple partition by clauses in the same sql query
Date Sat, 06 Jun 2015 01:10:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575481#comment-14575481
] 

Aman Sinha commented on DRILL-3196:
-----------------------------------

I ran some queries with slice_target = 1 to force exchanges and examined the plans.. they
look correct.  The hash distribution is done for each different partition-by column and the
sort is done on the combination of partition-by and order-by columns.  Here's one:  (note:
here I also disabled the mux  exchange):
{code}
| 00-00    Screen
00-01      ProjectAllowDup(x=[$0], y=[$1])
00-02        UnionExchange
01-01          Project(w0$o0=[$5], w1$o0=[$6])
01-02            Window(window#0=[window(partition {4} order by [3] range between UNBOUNDED
PRECEDING and CURRENT ROW aggs [MAX($1)])])
01-03              SelectionVectorRemover
01-04                Sort(sort0=[$4], sort1=[$3], dir0=[ASC], dir1=[ASC])
01-05                  HashToRandomExchange(dist0=[[$4]])
02-01                    Window(window#0=[window(partition {2} order by [3] range between
UNBOUNDED PRECEDING and CURRENT ROW aggs [MIN($1)])])
02-02                      SelectionVectorRemover
02-03                        Sort(sort0=[$2], sort1=[$3], dir0=[ASC], dir1=[ASC])
02-04                          HashToRandomExchange(dist0=[[$2]])
03-01                            Project(T9¦¦*=[$0], l_extendedprice=[$1], l_partkey=[$2],
l_suppkey=[$3], l_orderkey=[$4])
03-02                              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=classpath:/tpch/lineitem.parquet]], selectionRoot=/tpch/lineitem.parquet, numFiles=1,
columns=[`*`]]])
{code}

Given this, I think Abhishek's point is valid - we should revisit whether it still makes sense
to disable this piece of functionality.  Sometimes, disabling a specific functionality and
subsequent testing the disabled functionality is harder than testing the enabled one.  

> Disable multiple partition by clauses in the same sql query
> -----------------------------------------------------------
>
>                 Key: DRILL-3196
>                 URL: https://issues.apache.org/jira/browse/DRILL-3196
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>            Reporter: Victoria Markman
>            Assignee: Sean Hsuan-Yi Chu
>            Priority: Critical
>              Labels: window_function
>
> Currently these queries parse and execute, but plan does not look correct.
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for  select sum(a2) over(partition by a2 order
by a2), count(*) over(partition by a2,b2,c2)  from t2 order by 1,2; 
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0], EXPR$1=[$1])
> 00-02        SelectionVectorRemover
> 00-03          Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-04            Project(EXPR$0=[CASE(>($3, 0), CAST($4):ANY, null)], EXPR$1=[$5])
> 00-05              Window(window#0=[window(partition {0, 1, 2} order by [] range between
UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [COUNT()])])
> 00-06                SelectionVectorRemover
> 00-07                  Sort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC],
dir2=[ASC])
> 00-08                    Window(window#0=[window(partition {0} order by [0] range between
UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), $SUM0($0)])])
> 00-09                      SelectionVectorRemover
> 00-10                        Sort(sort0=[$0], sort1=[$0], dir0=[ASC], dir1=[ASC])
> 00-11                          Project(a2=[$1], b2=[$0], c2=[$2])
> 00-12                            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///drill/testdata/aggregation/t2]], selectionRoot=/drill/testdata/aggregation/t2,
numFiles=1, columns=[`a2`, `b2`, `c2`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message