hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pengcheng Xiong (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16421) Runtime filtering breaks user-level explain
Date Thu, 20 Apr 2017 17:40:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pengcheng Xiong updated HIVE-16421:
-----------------------------------
    Attachment: HIVE-16421.02.patch

> Runtime filtering breaks user-level explain
> -------------------------------------------
>
>                 Key: HIVE-16421
>                 URL: https://issues.apache.org/jira/browse/HIVE-16421
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Pengcheng Xiong
>            Assignee: Pengcheng Xiong
>         Attachments: HIVE-16421.01.patch, HIVE-16421.02.patch
>
>
> Query:
> {noformat}
> SELECT LAG(COALESCE(t2.int_col_14, t1.int_col_80),22) OVER (ORDER BY t1.tinyint_col_52 DESC) AS int_col FROM table_6 t1 INNER JOIN table_14 t2 ON ((t2.decimal0101_col_55) = (t1.decimal0101_col_9));
> {noformat}
> Without runtime filtering
> {noformat}
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> |                                                                                                           Explain                                                                                                           |
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> | Plan not optimized by CBO.                                                                                                                                                                                                  |
> |                                                                                                                                                                                                                             |
> | Vertex dependency in root stage                                                                                                                                                                                             |
> | Map 1 <- Map 3 (BROADCAST_EDGE)                                                                                                                                                                                             |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)                                                                                                                                                                                            |
> |                                                                                                                                                                                                                             |
> | Stage-0                                                                                                                                                                                                                     |
> |    Fetch Operator                                                                                                                                                                                                           |
> |       limit:-1                                                                                                                                                                                                              |
> |       Stage-1                                                                                                                                                                                                               |
> |          Reducer 2                                                                                                                                                                                                          |
> |          File Output Operator [FS_364]                                                                                                                                                                                      |
> |             compressed:false                                                                                                                                                                                                |
> |             Statistics:Num rows: 74781721 Data size: 299126884 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                 |
> |             table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}  |
> |             Select Operator [SEL_362]                                                                                                                                                                                       |
> |                outputColumnNames:["_col0"]                                                                                                                                                                                  |
> |                Statistics:Num rows: 74781721 Data size: 299126884 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                              |
> |                PTF Operator [PTF_361]                                                                                                                                                                                       |
> |                   Function definitions:[{"Input definition":{"type:":"WINDOWING"}},{"order by:":"_col51(DESC)","name:":"windowingtablefunction","partition by:":"0"}]                                                       |
> |                   Statistics:Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                           |
> |                   Select Operator [SEL_360]                                                                                                                                                                                 |
> |                   |  outputColumnNames:["_col51","_col79","_col97"]                                                                                                                                                         |
> |                   |  Statistics:Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                        |
> |                   |<-Map 1 [SIMPLE_EDGE] vectorized                                                                                                                                                                         |
> |                      Reduce Output Operator [RS_375]                                                                                                                                                                        |
> |                         key expressions:0 (type: int), _col51 (type: tinyint)                                                                                                                                               |
> |                         Map-reduce partition columns:0 (type: int)                                                                                                                                                          |
> |                         sort order:+-                                                                                                                                                                                       |
> |                         Statistics:Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                     |
> |                         value expressions:_col79 (type: int), _col97 (type: int)                                                                                                                                            |
> |                         Map Join Operator [MAPJOIN_374]                                                                                                                                                                     |
> |                         |  condition map:[{"":"Inner Join 0 to 1"}]                                                                                                                                                         |
> |                         |  HybridGraceHashJoin:true                                                                                                                                                                         |
> |                         |  keys:{"Map 3":"decimal0101_col_55 (type: decimal(1,1))","Map 1":"decimal0101_col_9 (type: decimal(1,1))"}                                                                                        |
> |                         |  outputColumnNames:["_col51","_col79","_col97"]                                                                                                                                                   |
> |                         |  Statistics:Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                  |
> |                         |<-Map 3 [BROADCAST_EDGE] vectorized                                                                                                                                                                |
> |                         |  Reduce Output Operator [RS_372]                                                                                                                                                                  |
> |                         |     key expressions:decimal0101_col_55 (type: decimal(1,1))                                                                                                                                       |
> |                         |     Map-reduce partition columns:decimal0101_col_55 (type: decimal(1,1))                                                                                                                          |
> |                         |     sort order:+                                                                                                                                                                                  |
> |                         |     Statistics:Num rows: 26256 Data size: 2749496 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                    |
> |                         |     value expressions:int_col_14 (type: int)                                                                                                                                                      |
> |                         |     Filter Operator [FIL_371]                                                                                                                                                                     |
> |                         |        predicate:decimal0101_col_55 is not null (type: boolean)                                                                                                                                   |
> |                         |        Statistics:Num rows: 26256 Data size: 2749496 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                 |
> |                         |        TableScan [TS_353]                                                                                                                                                                         |
> |                         |           alias:t2                                                                                                                                                                                |
> |                         |           Statistics:Num rows: 29079 Data size: 117014275 Basic stats: COMPLETE Column stats: COMPLETE                                                                                            |
> |                         |<-Filter Operator [FIL_373]                                                                                                                                                                        |
> |                               predicate:decimal0101_col_9 is not null (type: boolean)                                                                                                                                       |
> |                               Statistics:Num rows: 48419 Data size: 5233788 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                    |
> |                               TableScan [TS_352]                                                                                                                                                                            |
> |                                  alias:t1                                                                                                                                                                                   |
> |                                  Statistics:Num rows: 53742 Data size: 200230374 Basic stats: COMPLETE Column stats: COMPLETE                                                                                               |
> |                                                                                                                                                                                                                             |
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> {noformat}
> With runtime filtering:
> {noformat}
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> |                                                                                                                                                 Explain                                                                                                                                                  |
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> | STAGE DEPENDENCIES:                                                                                                                                                                                                                                                                                      |
> |   Stage-1 is a root stage                                                                                                                                                                                                                                                                                |
> |   Stage-0 depends on stages: Stage-1                                                                                                                                                                                                                                                                     |
> |                                                                                                                                                                                                                                                                                                          |
> | STAGE PLANS:                                                                                                                                                                                                                                                                                             |
> |   Stage: Stage-1                                                                                                                                                                                                                                                                                         |
> |     Tez                                                                                                                                                                                                                                                                                                  |
> |       DagId: hive_20170411232247_e177745a-39d0-4ae7-8ca0-871a137b36fa:1                                                                                                                                                                                                                                  |
> |       Edges:                                                                                                                                                                                                                                                                                             |
> |         Map 1 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE)                                                                                                                                                                                                                                      |
> |         Reducer 2 <- Map 1 (SIMPLE_EDGE)                                                                                                                                                                                                                                                                 |
> |         Reducer 4 <- Map 3 (SIMPLE_EDGE)                                                                                                                                                                                                                                                                 |
> |       DagName:                                                                                                                                                                                                                                                                                           |
> |       Vertices:                                                                                                                                                                                                                                                                                          |
> |         Map 1                                                                                                                                                                                                                                                                                            |
> |             Map Operator Tree:                                                                                                                                                                                                                                                                           |
> |                 TableScan                                                                                                                                                                                                                                                                                |
> |                   alias: t1                                                                                                                                                                                                                                                                              |
> |                   filterExpr: (decimal0101_col_9 is not null and (decimal0101_col_9 BETWEEN DynamicValue(RS_7_t2_decimal0101_col_9_min) AND DynamicValue(RS_7_t2_decimal0101_col_9_max) and in_bloom_filter(decimal0101_col_9, DynamicValue(RS_7_t2_decimal0101_col_9_bloom_filter)))) (type: boolean)   |
> |                   Statistics: Num rows: 53742 Data size: 5809320 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                            |
> |                   Filter Operator                                                                                                                                                                                                                                                                        |
> |                     predicate: (decimal0101_col_9 is not null and (decimal0101_col_9 BETWEEN DynamicValue(RS_7_t2_decimal0101_col_9_min) AND DynamicValue(RS_7_t2_decimal0101_col_9_max) and in_bloom_filter(decimal0101_col_9, DynamicValue(RS_7_t2_decimal0101_col_9_bloom_filter)))) (type: boolean)  |
> |                     Statistics: Num rows: 48419 Data size: 5233908 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                          |
> |                     Select Operator                                                                                                                                                                                                                                                                      |
> |                       expressions: decimal0101_col_9 (type: decimal(1,1)), tinyint_col_52 (type: tinyint), int_col_80 (type: int)                                                                                                                                                                        |
> |                       outputColumnNames: _col0, _col1, _col2                                                                                                                                                                                                                                             |
> |                       Statistics: Num rows: 48419 Data size: 5233908 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                        |
> |                       Map Join Operator                                                                                                                                                                                                                                                                  |
> |                         condition map:                                                                                                                                                                                                                                                                   |
> |                              Inner Join 0 to 1                                                                                                                                                                                                                                                           |
> |                         keys:                                                                                                                                                                                                                                                                            |
> |                           0 _col0 (type: decimal(1,1))                                                                                                                                                                                                                                                   |
> |                           1 _col1 (type: decimal(1,1))                                                                                                                                                                                                                                                   |
> |                         outputColumnNames: _col1, _col2, _col3                                                                                                                                                                                                                                           |
> |                         input vertices:                                                                                                                                                                                                                                                                  |
> |                           1 Map 3                                                                                                                                                                                                                                                                        |
> |                         Statistics: Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                 |
> |                         Reduce Output Operator                                                                                                                                                                                                                                                           |
> |                           key expressions: 0 (type: int), _col1 (type: tinyint)                                                                                                                                                                                                                          |
> |                           sort order: +-                                                                                                                                                                                                                                                                 |
> |                           Map-reduce partition columns: 0 (type: int)                                                                                                                                                                                                                                    |
> |                           Statistics: Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                               |
> |                           value expressions: _col2 (type: int), _col3 (type: int)                                                                                                                                                                                                                        |
> |             Execution mode: vectorized, llap                                                                                                                                                                                                                                                             |
> |         Map 3                                                                                                                                                                                                                                                                                            |
> |             Map Operator Tree:                                                                                                                                                                                                                                                                           |
> |                 TableScan                                                                                                                                                                                                                                                                                |
> |                   alias: t2                                                                                                                                                                                                                                                                              |
> |                   filterExpr: decimal0101_col_55 is not null (type: boolean)                                                                                                                                                                                                                             |
> |                   Statistics: Num rows: 29079 Data size: 3045240 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                            |
> |                   Filter Operator                                                                                                                                                                                                                                                                        |
> |                     predicate: decimal0101_col_55 is not null (type: boolean)                                                                                                                                                                                                                            |
> |                     Statistics: Num rows: 26256 Data size: 2749612 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                          |
> |                     Select Operator                                                                                                                                                                                                                                                                      |
> |                       expressions: int_col_14 (type: int), decimal0101_col_55 (type: decimal(1,1))                                                                                                                                                                                                       |
> |                       outputColumnNames: _col0, _col1                                                                                                                                                                                                                                                    |
> |                       Statistics: Num rows: 26256 Data size: 2749612 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                        |
> |                       Reduce Output Operator                                                                                                                                                                                                                                                             |
> |                         key expressions: _col1 (type: decimal(1,1))                                                                                                                                                                                                                                      |
> |                         sort order: +                                                                                                                                                                                                                                                                    |
> |                         Map-reduce partition columns: _col1 (type: decimal(1,1))                                                                                                                                                                                                                         |
> |                         Statistics: Num rows: 26256 Data size: 2749612 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                      |
> |                         value expressions: _col0 (type: int)                                                                                                                                                                                                                                             |
> |                       Select Operator                                                                                                                                                                                                                                                                    |
> |                         expressions: _col1 (type: decimal(1,1))                                                                                                                                                                                                                                          |
> |                         outputColumnNames: _col0                                                                                                                                                                                                                                                         |
> |                         Statistics: Num rows: 26256 Data size: 2749612 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                      |
> |                         Group By Operator                                                                                                                                                                                                                                                                |
> |                           aggregations: min(_col0), max(_col0), bloom_filter(_col0, expectedEntries=17)                                                                                                                                                                                                  |
> |                           mode: hash                                                                                                                                                                                                                                                                     |
> |                           outputColumnNames: _col0, _col1, _col2                                                                                                                                                                                                                                         |
> |                           Statistics: Num rows: 1 Data size: 336 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                            |
> |                           Reduce Output Operator                                                                                                                                                                                                                                                         |
> |                             sort order:                                                                                                                                                                                                                                                                  |
> |                             Statistics: Num rows: 1 Data size: 336 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                          |
> |                             value expressions: _col0 (type: decimal(1,1)), _col1 (type: decimal(1,1)), _col2 (type: binary)                                                                                                                                                                              |
> |             Execution mode: vectorized, llap                                                                                                                                                                                                                                                             |
> |         Reducer 2                                                                                                                                                                                                                                                                                        |
> |             Execution mode: llap                                                                                                                                                                                                                                                                         |
> |             Reduce Operator Tree:                                                                                                                                                                                                                                                                        |
> |               Select Operator                                                                                                                                                                                                                                                                            |
> |                 expressions: KEY.reducesinkkey1 (type: tinyint), VALUE._col1 (type: int), VALUE._col2 (type: int)                                                                                                                                                                                        |
> |                 outputColumnNames: _col1, _col2, _col3                                                                                                                                                                                                                                                   |
> |                 Statistics: Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                         |
> |                 PTF Operator                                                                                                                                                                                                                                                                             |
> |                   Function definitions:                                                                                                                                                                                                                                                                  |
> |                       Input definition                                                                                                                                                                                                                                                                   |
> |                         input alias: ptf_0                                                                                                                                                                                                                                                               |
> |                         output shape: _col1: tinyint, _col2: int, _col3: int                                                                                                                                                                                                                             |
> |                         type: WINDOWING                                                                                                                                                                                                                                                                  |
> |                       Windowing table definition                                                                                                                                                                                                                                                         |
> |                         input alias: ptf_1                                                                                                                                                                                                                                                               |
> |                         name: windowingtablefunction                                                                                                                                                                                                                                                     |
> |                         order by: _col1 DESC NULLS LAST                                                                                                                                                                                                                                                  |
> |                         partition by: 0                                                                                                                                                                                                                                                                  |
> |                         raw input shape:                                                                                                                                                                                                                                                                 |
> |                         window functions:                                                                                                                                                                                                                                                                |
> |                             window function definition                                                                                                                                                                                                                                                   |
> |                               alias: LAG_window_0                                                                                                                                                                                                                                                        |
> |                               arguments: COALESCE(_col3,_col2), 22                                                                                                                                                                                                                                       |
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> |                                                                                                                                                 Explain                                                                                                                                                  |
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> |                               name: LAG                                                                                                                                                                                                                                                                  |
> |                               window function: GenericUDAFLagEvaluator                                                                                                                                                                                                                                   |
> |                               window frame: PRECEDING(MAX)~FOLLOWING(MAX)                                                                                                                                                                                                                                |
> |                               isPivotResult: true                                                                                                                                                                                                                                                        |
> |                   Statistics: Num rows: 74781721 Data size: 897380652 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                       |
> |                   Select Operator                                                                                                                                                                                                                                                                        |
> |                     expressions: LAG_window_0 (type: int)                                                                                                                                                                                                                                                |
> |                     outputColumnNames: _col0                                                                                                                                                                                                                                                             |
> |                     Statistics: Num rows: 74781721 Data size: 299126884 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                     |
> |                     File Output Operator                                                                                                                                                                                                                                                                 |
> |                       compressed: false                                                                                                                                                                                                                                                                  |
> |                       Statistics: Num rows: 74781721 Data size: 299126884 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                   |
> |                       table:                                                                                                                                                                                                                                                                             |
> |                           input format: org.apache.hadoop.mapred.SequenceFileInputFormat                                                                                                                                                                                                                 |
> |                           output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                                                                                                                                                                                                       |
> |                           serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                                                                                                                                                                                                                      |
> |         Reducer 4                                                                                                                                                                                                                                                                                        |
> |             Execution mode: vectorized, llap                                                                                                                                                                                                                                                             |
> |             Reduce Operator Tree:                                                                                                                                                                                                                                                                        |
> |               Group By Operator                                                                                                                                                                                                                                                                          |
> |                 aggregations: min(VALUE._col0), max(VALUE._col1), bloom_filter(VALUE._col2, expectedEntries=17)                                                                                                                                                                                          |
> |                 mode: final                                                                                                                                                                                                                                                                              |
> |                 outputColumnNames: _col0, _col1, _col2                                                                                                                                                                                                                                                   |
> |                 Statistics: Num rows: 1 Data size: 336 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                                      |
> |                 Reduce Output Operator                                                                                                                                                                                                                                                                   |
> |                   sort order:                                                                                                                                                                                                                                                                            |
> |                   Statistics: Num rows: 1 Data size: 336 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                                                    |
> |                   value expressions: _col0 (type: decimal(1,1)), _col1 (type: decimal(1,1)), _col2 (type: binary)                                                                                                                                                                                        |
> |                                                                                                                                                                                                                                                                                                          |
> |   Stage: Stage-0                                                                                                                                                                                                                                                                                         |
> |     Fetch Operator                                                                                                                                                                                                                                                                                       |
> |       limit: -1                                                                                                                                                                                                                                                                                          |
> |       Processor Tree:                                                                                                                                                                                                                                                                                    |
> |         ListSink                                                                                                                                                                                                                                                                                         |
> |                                                                                                                                                                                                                                                                                                          |
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> 135 rows selected (2.348 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message