drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hao Zhu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable
Date Wed, 26 Aug 2015 00:29:45 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712244#comment-14712244
] 

Hao Zhu commented on DRILL-3710:
--------------------------------

a. No optimization
{code}
explain plan for
select count(1) from h1_passwords where cast(col2 as int) in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19);
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-02        Project($f0=[1])
00-03          SelectionVectorRemover
00-04            Filter(condition=[OR(=(CAST($0):INTEGER, 1), =(CAST($0):INTEGER, 2), =(CAST($0):INTEGER,
3), =(CAST($0):INTEGER, 4), =(CAST($0):INTEGER, 5), =(CAST($0):INTEGER, 6), =(CAST($0):INTEGER,
7), =(CAST($0):INTEGER, 8), =(CAST($0):INTEGER, 9), =(CAST($0):INTEGER, 10), =(CAST($0):INTEGER,
11), =(CAST($0):INTEGER, 12), =(CAST($0):INTEGER, 13), =(CAST($0):INTEGER, 14), =(CAST($0):INTEGER,
15), =(CAST($0):INTEGER, 16), =(CAST($0):INTEGER, 17), =(CAST($0):INTEGER, 18), =(CAST($0):INTEGER,
19))])
00-05              Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h1_passwords),
inputSplits=[maprfs:///user/hive/warehouse/h1_passwords/passwd:0+1680], columns=[`col2`],
partitions= null]])
{code}
b. With optimization
{code}
explain plan for
select count(1) from h1_passwords where cast(col2 as int) in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20);
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-02        Project($f0=[1])
00-03          Project(f6=[$1], ROW_VALUE=[$0])
00-04            MergeJoin(condition=[=($1, $0)], joinType=[inner])
00-06              SelectionVectorRemover
00-08                Sort(sort0=[$0], dir0=[ASC])
00-10                  HashAgg(group=[{0}])
00-12                    Values
00-05              SelectionVectorRemover
00-07                Sort(sort0=[$0], dir0=[ASC])
00-09                  Project(f6=[CAST($0):INTEGER])
00-11                    Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h1_passwords),
inputSplits=[maprfs:///user/hive/warehouse/h1_passwords/passwd:0+1680], columns=[`col2`],
partitions= null]])
{code}

> Make the 20 in-list optimization configurable
> ---------------------------------------------
>
>                 Key: DRILL-3710
>                 URL: https://issues.apache.org/jira/browse/DRILL-3710
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization
>    Affects Versions: 1.1.0
>            Reporter: Hao Zhu
>            Assignee: Jinfeng Ni
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert that in-lists
into a small hash table in memory, and then do a table join instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk in-list
to make it more than 20.
> Sample query is :
> select count(*) from table where col in (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message