hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-15637) Hive/Druid integration: wrong semantics of groupBy query limit with granularity
Date Mon, 16 Jan 2017 11:20:26 GMT

     [ https://issues.apache.org/jira/browse/HIVE-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jesus Camacho Rodriguez updated HIVE-15637:
-------------------------------------------
    Description: 
Similar to HIVE-15636, but for GroupBy queries. Limit is applied per granularity unit, not
globally for the query.

{code:sql}
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s
LIMIT 10;
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Map 1 vectorized
      File Output Operator [FS_4]
        Select Operator [SEL_3] (rows=15888 width=0)
          Output:["_col0","_col1","_col2","_col3"]
          TableScan [TS_0] (rows=15888 width=0)
            tpcds_druid_10@store_sales_sold_time_subset,store_sales_sold_time_subset,Tbl:PARTIAL,Col:NONE,Output:["i_brand_id","__time","$f2","$f3"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_tpcds_ss_sold_time_subset\",\"granularity\":\"DAY\",\"dimensions\":[\"i_brand_id\"],\"limitSpec\":{\"type\":\"default\",\"limit\":10,\"columns\":[{\"dimension\":\"$f3\",\"direction\":\"ascending\"}]},\"aggregations\":[{\"type\":\"longMax\",\"name\":\"$f2\",\"fieldName\":\"ss_quantity\"},{\"type\":\"doubleSum\",\"name\":\"$f3\",\"fieldName\":\"ss_wholesale_cost\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"}
{code}

  was:
Similar to HIVE-15635, but for GroupBy queries. Limit is applied per granularity unit, not
globally for the query.

{code:sql}
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s
LIMIT 10;
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Map 1 vectorized
      File Output Operator [FS_4]
        Select Operator [SEL_3] (rows=15888 width=0)
          Output:["_col0","_col1","_col2","_col3"]
          TableScan [TS_0] (rows=15888 width=0)
            tpcds_druid_10@store_sales_sold_time_subset,store_sales_sold_time_subset,Tbl:PARTIAL,Col:NONE,Output:["i_brand_id","__time","$f2","$f3"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_tpcds_ss_sold_time_subset\",\"granularity\":\"DAY\",\"dimensions\":[\"i_brand_id\"],\"limitSpec\":{\"type\":\"default\",\"limit\":10,\"columns\":[{\"dimension\":\"$f3\",\"direction\":\"ascending\"}]},\"aggregations\":[{\"type\":\"longMax\",\"name\":\"$f2\",\"fieldName\":\"ss_quantity\"},{\"type\":\"doubleSum\",\"name\":\"$f3\",\"fieldName\":\"ss_wholesale_cost\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"}
{code}


> Hive/Druid integration: wrong semantics of groupBy query limit with granularity
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-15637
>                 URL: https://issues.apache.org/jira/browse/HIVE-15637
>             Project: Hive
>          Issue Type: Bug
>          Components: Druid integration
>    Affects Versions: 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>
> Similar to HIVE-15636, but for GroupBy queries. Limit is applied per granularity unit,
not globally for the query.
> {code:sql}
> SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), sum(ss_wholesale_cost) as s
> FROM store_sales_sold_time_subset
> GROUP BY i_brand_id, floor_day(`__time`)
> ORDER BY s
> LIMIT 10;
> OK
> Plan optimized by CBO.
> Stage-0
>   Fetch Operator
>     limit:-1
>     Stage-1
>       Map 1 vectorized
>       File Output Operator [FS_4]
>         Select Operator [SEL_3] (rows=15888 width=0)
>           Output:["_col0","_col1","_col2","_col3"]
>           TableScan [TS_0] (rows=15888 width=0)
>             tpcds_druid_10@store_sales_sold_time_subset,store_sales_sold_time_subset,Tbl:PARTIAL,Col:NONE,Output:["i_brand_id","__time","$f2","$f3"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_tpcds_ss_sold_time_subset\",\"granularity\":\"DAY\",\"dimensions\":[\"i_brand_id\"],\"limitSpec\":{\"type\":\"default\",\"limit\":10,\"columns\":[{\"dimension\":\"$f3\",\"direction\":\"ascending\"}]},\"aggregations\":[{\"type\":\"longMax\",\"name\":\"$f2\",\"fieldName\":\"ss_quantity\"},{\"type\":\"doubleSum\",\"name\":\"$f3\",\"fieldName\":\"ss_wholesale_cost\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message