hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt McCline (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization
Date Sun, 09 Oct 2016 20:27:20 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matt McCline updated HIVE-11394:
--------------------------------
    Attachment: HIVE-11394.07.patch

> Enhance EXPLAIN display for vectorization
> -----------------------------------------
>
>                 Key: HIVE-11394
>                 URL: https://issues.apache.org/jira/browse/HIVE-11394
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, HIVE-11394.03.patch, HIVE-11394.04.patch,
HIVE-11394.05.patch, HIVE-11394.06.patch, HIVE-11394.07.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization enabled) and a
summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
> …
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: decimal_date_test
>                   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: COMPLETE
Column stats: NONE
>                   Filter Operator
>                     predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: boolean)
>                     Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE
Column stats: NONE
>                     Select Operator
>                       expressions: cdate (type: date)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE
Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: date)
>                         sort order: +
>                         Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE
Column stats: NONE
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS
true
>                 groupByVectorOutput: true
>                 inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>         Reducer 2 
>             Execution mode: vectorized, llap
>             Reduce Vectorization:
>                 enabled: true
>                 enableConditionsMet: hive.vectorized.execution.reduce.enabled IS true,
hive.execution.engine tez IN [tez, spark] IS true
>                 groupByVectorOutput: true
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: KEY.reducesinkkey0 (type: date)
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE Column
stats: NONE
>                 File Output Operator
>                   compressed: false
>                   Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE
Column stats: NONE
>                   table:
>                       input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                       output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                       serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> EXPLAIN VECTORIZATION DETAIL
> (Note the added  Select Vectorization, Group By Vectorization, Reduce Sink Vectorization
sections in this example)
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
> …
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>         Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> …
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: vectortab2korc
>                   Statistics: Num rows: 2000 Data size: 918712 Basic stats: COMPLETE
Column stats: NONE
>                   Select Operator
>                     expressions: bo (type: boolean), b (type: bigint)
>                     outputColumnNames: bo, b
>                     Select Vectorization:
>                         className: VectorSelectOperator
>                         native: true
>                         nativeConditionsMet: Supported IS true
>                         selectExpressions: IdentityExpression[7:boolean], IdentityExpression[3:bigint]
>                         vectorized: true
>                     Statistics: Num rows: 2000 Data size: 918712 Basic stats: COMPLETE
Column stats: NONE
>                     Group By Operator
>                       aggregations: max(b)
>                       Group By Vectorization:
>                           aggregators: VectorUDAFMaxLong(IdentityExpression[3:bigint])
>                           className: VectorGroupByOperator
>                           vectorOutput: true
>                           keyExpressions: IdentityExpression[7:boolean]
>                           native: false
>                           nativeConditionsNotMet: Supported IS false
>                           vectorized: true
>                       keys: bo (type: boolean)
>                       mode: hash
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 2000 Data size: 918712 Basic stats: COMPLETE
Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: boolean)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: boolean)
>                         Reduce Sink Vectorization:
>                             className: VectorReduceSinkLongOperator
>                             native: true
>                             nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true,
No buckets IS true, No TopN IS true, Uniform Hash IS true, No DISTINCT columns IS true, BinarySortableSerDe
for keys IS true, LazyBinarySerDe for values IS true
>                             vectorized: true
>                         Statistics: Num rows: 2000 Data size: 918712 Basic stats: COMPLETE
Column stats: NONE
>                         value expressions: _col1 (type: bigint)
>             Execution mode: vectorized
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS
true
>                 groupByVectorOutput: true
>                 inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>         Reducer 2 
>             Execution mode: vectorized
>             Reduce Vectorization:
>                 enabled: true
>                 enableConditionsMet: hive.vectorized.execution.reduce.enabled IS true,
hive.execution.engine tez IN [tez, spark] IS true
>                 groupByVectorOutput: true
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: max(VALUE._col0)
>                 Group By Vectorization:
>                     aggregators: VectorUDAFMaxLong(IdentityExpression[1:bigint])
>                     className: VectorGroupByOperator
>                     vectorOutput: true
>                     keyExpressions: IdentityExpression[0:boolean]
>                     native: false
>                     nativeConditionsNotMet: Supported IS false
>                     vectorized: true
>                 keys: KEY._col0 (type: boolean)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1
>                 Statistics: Num rows: 1000 Data size: 459356 Basic stats: COMPLETE Column
stats: NONE
>                 Reduce Output Operator
>                   key expressions: _col0 (type: boolean)
>                   sort order: -
>                   Reduce Sink Vectorization:
>                       className: VectorReduceSinkOperator
>                       native: false
>                       nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled
IS true, hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true,
No buckets IS true, No TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for
keys IS true, LazyBinarySerDe for values IS true
>                       nativeConditionsNotMet: Uniform Hash IS false
>                       vectorized: true
>                   Statistics: Num rows: 1000 Data size: 459356 Basic stats: COMPLETE
Column stats: NONE
>                   value expressions: _col1 (type: bigint)
> …
> {code}
> EXPLAIN VECTORIZATION ONLY example:
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Map 1 <- Map 2 (BROADCAST_EDGE)
>       Vertices:
>         Map 1 
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS
true
>                 groupByVectorOutput: true
>                 inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>         Map 2 
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS
true
>                 groupByVectorOutput: true
>                 inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: true
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>   Stage: Stage-0
> {code}
> The standard @Explain Annotation Type is used.  A new 'vectorization' annotation marks
each new class and method.
> Works for FORMATTED, like other non-vectorization EXPLAIN variations.
> EXPLAIN VECTORIZATION ONLY SUMMARY FORMATTED
> {code}
> {"PLAN VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled
IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE
PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map 3","type":"BROADCAST_EDGE"},{"parent":"Map
4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map 1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map
1":{"Map Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
IS true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
3":{"Map Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
IS true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
4":{"Map Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
IS true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer
2":{"Reduce Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled
IS true","hive.execution.engine tez IN [tez, spark] IS true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}}
> {code}
> or pretty printed:
> {code}
> {
>   "PLAN VECTORIZATION": {
>     "enabled": true,
>     "enabledConditionsMet": [
>       "hive.vectorized.execution.enabled IS true"
>     ]
>   },
>   "STAGE DEPENDENCIES": {
>     "Stage-1": {
>       "ROOT STAGE": "TRUE"
>     },
>     "Stage-0": {
>       "DEPENDENT STAGES": "Stage-1"
>     }
>   },
>   "STAGE PLANS": {
>     "Stage-1": {
>       "Tez": {
>         "Edges:": {
>           "Map 1": [
>             {
>               "parent": "Map 3",
>               "type": "BROADCAST_EDGE"
>             },
>             {
>               "parent": "Map 4",
>               "type": "BROADCAST_EDGE"
>             }
>           ],
>           "Reducer 2": {
>             "parent": "Map 1",
>             "type": "SIMPLE_EDGE"
>           }
>         },
>         "Vertices:": {
>           "Map 1": {
>             "Map Vectorization:": {
>               "enabled:": "true",
>               "enabledConditionsMet:": [
>                 "hive.vectorized.use.vectorized.input.format IS true"
>               ],
>               "groupByVectorOutput:": "true",
>               "inputFileFormats:": [
>                 "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
>               ],
>               "allNative:": "false",
>               "usesVectorUDFAdaptor:": "false",
>               "vectorized:": "true"
>             }
>           },
>           "Map 3": {
>             "Map Vectorization:": {
>               "enabled:": "true",
>               "enabledConditionsMet:": [
>                 "hive.vectorized.use.vectorized.input.format IS true"
>               ],
>               "groupByVectorOutput:": "true",
>               "inputFileFormats:": [
>                 "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
>               ],
>               "allNative:": "true",
>               "usesVectorUDFAdaptor:": "false",
>               "vectorized:": "true"
>             }
>           },
>           "Map 4": {
>             "Map Vectorization:": {
>               "enabled:": "true",
>               "enabledConditionsMet:": [
>                 "hive.vectorized.use.vectorized.input.format IS true"
>               ],
>               "groupByVectorOutput:": "true",
>               "inputFileFormats:": [
>                 "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
>               ],
>               "allNative:": "true",
>               "usesVectorUDFAdaptor:": "false",
>               "vectorized:": "true"
>             }
>           },
>           "Reducer 2": {
>             "Reduce Vectorization:": {
>               "enabled:": "true",
>               "enableConditionsMet:": [
>                 "hive.vectorized.execution.reduce.enabled IS true",
>                 "hive.execution.engine tez IN [tez, spark] IS true"
>               ],
>               "groupByVectorOutput:": "true",
>               "allNative:": "false",
>               "usesVectorUDFAdaptor:": "false",
>               "vectorized:": "true"
>             }
>           }
>         }
>       }
>     },
>     "Stage-0": {
>       
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message