drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4147) Union All operator runs in a single fragment
Date Mon, 01 Aug 2016 21:16:20 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402837#comment-15402837
] 

ASF GitHub Bot commented on DRILL-4147:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/555#discussion_r73054394
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestUnionAll.java ---
    @@ -1013,4 +1019,97 @@ public void testUnionAllInWith() throws Exception {
             .build()
             .run();
       }
    +
    +  @Test // DRILL-4147 // base case
    +  public void testDrill4147_1() throws Exception {
    +    final String l = FileUtils.getResourceAsFile("/multilevel/parquet/1994").toURI().toString();
    +    final String r = FileUtils.getResourceAsFile("/multilevel/parquet/1995").toURI().toString();
    +
    +    final String query = String.format("SELECT o_custkey FROM dfs_test.`%s` \n" +
    +        "Union All SELECT o_custkey FROM dfs_test.`%s`", l, r);
    +
    +    // Validate the plan
    +    final String[] expectedPlan = {"UnionExchange.*\n",
    +        ".*Project.*\n" +
    +        ".*UnionAll"};
    +    final String[] excludedPlan = {};
    +
    +    test(sliceTargetSmall);
    +    PlanTestBase.testPlanMatchingPatterns(query, expectedPlan, excludedPlan);
    +
    +    try {
    +      testBuilder()
    +        .optionSettingQueriesForTestQuery(sliceTargetSmall)
    +        .optionSettingQueriesForBaseline(sliceTargetDefault)
    +        .unOrdered()
    +        .sqlQuery(query)
    +        .sqlBaselineQuery(query)
    +        .build()
    +        .run();
    +    } finally {
    +      test(sliceTargetDefault);
    +    }
    +  }
    +
    +  @Test // DRILL-4147  // group-by on top of union-all
    +  public void testDrill4147_2() throws Exception {
    +    final String l = FileUtils.getResourceAsFile("/multilevel/parquet/1994").toURI().toString();
    +    final String r = FileUtils.getResourceAsFile("/multilevel/parquet/1995").toURI().toString();
    +
    +    final String query = String.format("Select o_custkey, count(*) as cnt from \n" +
    +        " (SELECT o_custkey FROM dfs_test.`%s` \n" +
    +        "Union All SELECT o_custkey FROM dfs_test.`%s`) \n" +
    +        "group by o_custkey", l, r);
    +
    +    // Validate the plan
    +    final String[] expectedPlan = {"(?s)UnionExchange.*HashAgg.*HashToRandomExchange.*UnionAll.*"};
    +    final String[] excludedPlan = {};
    +
    +    test(sliceTargetSmall);
    --- End diff --
    
    Yes, I will move this and other such calls within the try block.  Thanks for pointing
out. 


> Union All operator runs in a single fragment
> --------------------------------------------
>
>                 Key: DRILL-4147
>                 URL: https://issues.apache.org/jira/browse/DRILL-4147
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: amit hadke
>            Assignee: Aman Sinha
>
> A User noticed that running select  from a single directory is much faster than union
all on two directories.
> (https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/#comment-2349732267)

> It seems like UNION ALL operator doesn't parallelize sub scans (its using SINGLETON for
distribution type). Everything is ran in single fragment.
> We may have to use SubsetTransformer in UnionAllPrule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message