drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable
Date Fri, 22 Jul 2016 18:27:20 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389992#comment-15389992
] 

ASF GitHub Bot commented on DRILL-3710:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/552#discussion_r71923747
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestPartitionFilter.java ---
    @@ -376,4 +376,14 @@ public void testPartitionFilterWithLike() throws Exception {
         testIncludeFilter(query4, 4, "Filter", 16);
       }
     
    +  @Test //DRILL-3710 Partition pruning should occur with varying IN-LIST size
    +  public void testPartitionFilterWithInSubquery() throws Exception {
    +    String query = String.format("select * from dfs_test.`%s/multilevel/parquet` where
cast (dir0 as int) IN (1994, 1994, 1994, 1994, 1994, 1994)", TEST_RES_PATH);
    +    /* In list size exceeds threshold - no partition pruning since predicate converted
to join */
    +    test("alter session set `planner.in_subquery_threshold` = 2");
    --- End diff --
    
    Not sure if it is necessary to check the no-partition-pruning case.  Basically, the goal
of the test is to see if partition pruning works with large IN lists. 


> Make the 20 in-list optimization configurable
> ---------------------------------------------
>
>                 Key: DRILL-3710
>                 URL: https://issues.apache.org/jira/browse/DRILL-3710
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization
>    Affects Versions: 1.1.0
>            Reporter: Hao Zhu
>            Assignee: Gautam Kumar Parai
>             Fix For: Future
>
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert that in-lists
into a small hash table in memory, and then do a table join instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk in-list
to make it more than 20.
> Sample query is :
> select count(*) from table where col in (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message