drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Hsuan-Yi Chu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2376) UNION ALL on Aggregates with GROUP BY returns incomplete results
Date Fri, 24 Apr 2015 18:40:39 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511502#comment-14511502
] 

Sean Hsuan-Yi Chu commented on DRILL-2376:
------------------------------------------

In fact the issue resides in StreamAgg, which gave Union-All the wrong information regarding
schema change.

You can reproduce this issue with this physical plan:
(This plan is equivalent to "select sss from (select sum(1) as sss from cp.`tpch/nation.parquet`)
group by sss";
But if SQL is typed in, calcite would not choose this plan)

{
  "head" : {
    "version" : 1,
    "generator" : {
      "type" : "ExplainHandler",
      "info" : ""
    },
    "type" : "APACHE_DRILL_PHYSICAL",
    "options" : [ {
      "name" : "planner.enable_streamagg",
      "kind" : "BOOLEAN",
      "type" : "SESSION",
      "bool_val" : true
    } ],
    "queue" : 0,
    "resultMode" : "EXEC"
  },
  "graph" : [ {
    "pop" : "parquet-scan",
    "@id" : 4,
    "userName" : "hyichu",
    "entries" : [ {
      "path" : "/tpch/nation.parquet"
    } ],
    "storage" : {
      "type" : "file",
      "enabled" : true,
      "connection" : "classpath:///",
      "workspaces" : null,
      "formats" : {
        "csv" : {
          "type" : "text",
          "extensions" : [ "csv" ],
          "delimiter" : ","
        },
        "tsv" : {
          "type" : "text",
          "extensions" : [ "tsv" ],
          "delimiter" : "\t"
        },
        "json" : {
          "type" : "json"
        },
        "parquet" : {
          "type" : "parquet"
        },
        "avro" : {
          "type" : "avro"
        }
      }
    },
    "format" : {
      "type" : "parquet"
    },
    "columns" : [ "`*`" ],
    "selectionRoot" : "/tpch/nation.parquet",
    "cost" : 25.0
  }, {
    "pop" : "project",
    "@id" : 3,
    "exprs" : [ {
      "ref" : "`$f0`",
      "expr" : "1"
    } ],
    "child" : 4,
    "initialAllocation" : 1000000,
    "maxAllocation" : 10000000000,
    "cost" : 25.0
  }, {
    "pop" : "streaming-aggregate",
    "@id" : 2,
    "child" : 3,
    "keys" : [ ],
    "exprs" : [ {
      "ref" : "`sss`",
      "expr" : "sum(`$f0`) "
    } ],
    "initialAllocation" : 1000000,
    "maxAllocation" : 10000000000,
    "cost" : 1.0
  }, {
    "pop" : "hash-aggregate",
    "@id" : 1,
    "child" : 2,
    "cardinality" : 1.0,
    "initialAllocation" : 1000000,
    "maxAllocation" : 10000000000,
    "groupByExprs" : [ {
      "ref" : "`sss`",
      "expr" : "`sss`"
    } ],
    "aggrExprs" : [ ],
    "cost" : 12.5
  }, {
    "pop" : "screen",
    "@id" : 0,
    "child" : 1,
    "initialAllocation" : 1000000,
    "maxAllocation" : 10000000000,
    "cost" : 1.0
  } ]
} 

> UNION ALL on Aggregates with GROUP BY returns incomplete results
> ----------------------------------------------------------------
>
>                 Key: DRILL-2376
>                 URL: https://issues.apache.org/jira/browse/DRILL-2376
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.9.0
>            Reporter: Abhishek Girish
>            Assignee: Sean Hsuan-Yi Chu
>             Fix For: 0.8.0
>
>         Attachments: t1.parquet, t2.parquet
>
>
> The following query returns incomplete results:
> {code:sql}
> select x
> from
> (SELECT Sum(ss_ext_sales_price) x
> FROM  store_sales
> UNION ALL
> SELECT Sum(cs_ext_sales_price) x
> FROM catalog_sales) tmp
> GROUP BY x;
> Results from Drill:
> +------------+
> |     x      |
> +------------+
> | 3658019159.35 |
> +------------+
> 1 row selected (3.474 seconds)
> Results from Postgres:
>        x       
> ---------------
>  5265207074.51
>  3658019159.35
> (2 rows)
> {code}
> Removing GROUP BY returns the right results:
> {code:sql}
> select x
> from
> (SELECT Sum(ss_ext_sales_price) x
> FROM  store_sales
> UNION ALL
> SELECT Sum(cs_ext_sales_price) x
> FROM catalog_sales) tmp;
> Results from Drill:
> +------------+
> |     x      |
> +------------+
> | 5265207074.51 |
> | 3658019159.35 |
> +------------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message