Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CE43C200BEB for ; Wed, 28 Dec 2016 11:10:01 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id CC621160B2E; Wed, 28 Dec 2016 10:10:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 557CB160B19 for ; Wed, 28 Dec 2016 11:10:00 +0100 (CET) Received: (qmail 9438 invoked by uid 500); 28 Dec 2016 10:09:58 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 9352 invoked by uid 99); 28 Dec 2016 10:09:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Dec 2016 10:09:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 92B242C2A66 for ; Wed, 28 Dec 2016 10:09:58 +0000 (UTC) Date: Wed, 28 Dec 2016 10:09:58 +0000 (UTC) From: "Hive QA (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-11394) Enhance EXPLAIN display for vectorization MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 28 Dec 2016 10:10:02 -0000 [ https://issues.apache.org/jira/browse/HIVE-11394?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1578= 2562#comment-15782562 ]=20 Hive QA commented on HIVE-11394: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12844901/HIVE-11394.097.pa= tch {color:green}SUCCESS:{color} +1 due to 159 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 118 failed/errored test(s), 10869 tests = executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (= batchId=3D233) TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely time= d out) (batchId=3D146) =09[load_dyn_part5.q,vector_complex_join.q,orc_llap.q,vectorization_pushdow= n.q,cbo_gby_empty.q,vectorization_short_regress.q,cbo_gby.q,auto_sortmerge_= join_1.q,lineage3.q,cross_product_check_1.q,cbo_join.q,vector_struct_in.q,b= ucketmapjoin3.q,current_date_timestamp.q,orc_ppd_schema_evol_2a.q,groupby2.= q,schema_evol_text_vec_table.q,vectorized_join46.q,orc_ppd_date.q,multiMapJ= oin1.q,sample10.q,vector_outer_join1.q,vector_char_simple.q,dynpart_sort_op= timization_acid.q,auto_sortmerge_join_2.q,bucketizedhiveinputformat.q,lefts= emijoin.q,special_character_in_tabnames_1.q,cte_mat_2.q,vectorization_8.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_aggregate_9] = (batchId=3D36) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_between_colum= ns] (batchId=3D63) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_binary_join_g= roupby] (batchId=3D74) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_cast_constant= ] (batchId=3D8) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_2] (batc= hId=3D63) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_mapjoin1= ] (batchId=3D30) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_coalesce_2] (= batchId=3D65) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_count] (batch= Id=3D12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_aggre= gate] (batchId=3D17) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_mapjo= in] (batchId=3D51) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_preci= sion] (batchId=3D46) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_distinct_2] (= batchId=3D47) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_empty_where] = (batchId=3D22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby4] (ba= tchId=3D14) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby6] (ba= tchId=3D79) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_3] (b= atchId=3D59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_mapjo= in] (batchId=3D68) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduc= e] (batchId=3D51) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_grouping_sets= ] (batchId=3D76) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_include_no_se= l] (batchId=3D4) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_mapj= oin] (batchId=3D35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_left_outer_jo= in2] (batchId=3D59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mapjoin_reduc= e] (batchId=3D71) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_null_projecti= on] (batchId=3D9) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_orderby_5] (b= atchId=3D38) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join0] = (batchId=3D56) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join1] = (batchId=3D41) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join2] = (batchId=3D28) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join3] = (batchId=3D30) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join4] = (batchId=3D77) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join6] = (batchId=3D38) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_reduce_groupb= y_decimal] (batchId=3D30) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_string_concat= ] (batchId=3D30) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_tablesample_r= ows] (batchId=3D47) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_when_case_nul= l] (batchId=3D33) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_13] (b= atchId=3D45) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_limit]= (batchId=3D33) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_date_func= s] (batchId=3D69) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_mapjoin2]= (batchId=3D22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_mapjoin] = (batchId=3D66) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_parquet_t= ypes] (batchId=3D61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_shufflejo= in] (batchId=3D66) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_timestamp= ] (batchId=3D70) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_timestamp= _funcs] (batchId=3D28) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basi= c] (batchId=3D134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_sche= ma_evol_3a] (batchId=3D135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_= evol_orc_vec_part] (batchId=3D152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_= evol_orc_vec_part_all_complex] (batchId=3D147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_= evol_orc_vec_part_all_primitive] (batchId=3D151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_= evol_orc_vec_table] (batchId=3D145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_= evol_text_vec_part] (batchId=3D148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_= evol_text_vec_part_all_complex] (batchId=3D144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_= evol_text_vec_part_all_primitive] (batchId=3D149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_= evol_text_vecrow_part] (batchId=3D152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_= evol_text_vecrow_part_all_complex] (batchId=3D152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_= evol_text_vecrow_part_all_primitive] (batchId=3D149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_= evol_text_vecrow_table] (batchId=3D139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquer= y_notin] (batchId=3D150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= adaptor_usage_mode] (batchId=3D152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= aggregate_9] (batchId=3D145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= auto_smb_mapjoin_14] (batchId=3D143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= between_columns] (batchId=3D150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= binary_join_groupby] (batchId=3D152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= cast_constant] (batchId=3D139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= char_mapjoin1] (batchId=3D144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= coalesce_2] (batchId=3D150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= complex_all] (batchId=3D149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= count] (batchId=3D139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= decimal_aggregate] (batchId=3D140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= decimal_mapjoin] (batchId=3D148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= decimal_precision] (batchId=3D147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= groupby_mapjoin] (batchId=3D151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= grouping_sets] (batchId=3D153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= include_no_sel] (batchId=3D138) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= inner_join] (batchId=3D150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= interval_mapjoin] (batchId=3D145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= join30] (batchId=3D144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= left_outer_join2] (batchId=3D149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= leftsemi_mapjoin] (batchId=3D139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= mapjoin_reduce] (batchId=3D152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= null_projection] (batchId=3D139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= nullsafe_join] (batchId=3D153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= number_compare_projection] (batchId=3D139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= nvl] (batchId=3D151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= outer_join0] (batchId=3D149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= outer_join2] (batchId=3D143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= udf1] (batchId=3D148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_= when_case_null] (batchId=3D144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectori= zation_0] (batchId=3D153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectori= zation_13] (batchId=3D147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectori= zed_date_funcs] (batchId=3D151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectori= zed_mapjoin] (batchId=3D151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectori= zed_shufflejoin] (batchId=3D151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectori= zed_timestamp_funcs] (batchId=3D143) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vecto= r_inner_join] (batchId=3D161) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vecto= r_outer_join0] (batchId=3D160) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vecto= r_outer_join1] (batchId=3D159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vecto= r_outer_join2] (batchId=3D159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vecto= r_outer_join3] (batchId=3D159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vecto= r_outer_join4] (batchId=3D161) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vecto= r_outer_join5] (batchId=3D161) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyz= e_2] (batchId=3D93) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyz= e_3] (batchId=3D92) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vectorization= _limit] (batchId=3D92) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_cast_con= stant] (batchId=3D98) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_decimal_= aggregate] (batchId=3D102) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_decimal_= mapjoin] (batchId=3D117) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_= reduce] (batchId=3D128) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_0= ] (batchId=3D129) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_1= 3] (batchId=3D115) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_s= hort_regress] (batchId=3D114) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_mapj= oin] (batchId=3D125) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shuf= flejoin] (batchId=3D125) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_time= stamp_funcs] (batchId=3D107) org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticCompl= exSchema[0] (batchId=3D172) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate3 (batchId= =3D172) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2734/testR= eport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2734/con= sole Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2734/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 118 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12844901 - PreCommit-HIVE-Build > Enhance EXPLAIN display for vectorization > ----------------------------------------- > > Key: HIVE-11394 > URL: https://issues.apache.org/jira/browse/HIVE-11394 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Fix For: 2.2.0 > > Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, HIVE-11394= .03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, HIVE-11394.06.patch, H= IVE-11394.07.patch, HIVE-11394.08.patch, HIVE-11394.09.patch, HIVE-11394.09= 1.patch, HIVE-11394.092.patch, HIVE-11394.093.patch, HIVE-11394.094.patch, = HIVE-11394.095.patch, HIVE-11394.096.patch, HIVE-11394.097.patch > > > Add detail to the EXPLAIN output showing why a Map and Reduce work is not= vectorized. > New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|OPERATOR|EXPRESSI= ON|DETAIL\] > The ONLY option suppresses most non-vectorization elements. > SUMMARY shows vectorization information for the PLAN (is vectorization en= abled) and a summary of Map and Reduce work. > OPERATOR shows vectorization information for operators. E.g. Filter Vect= orization. It includes all information of SUMMARY, too. > EXPRESSION shows vectorization information for expressions. E.g. predica= teExpression. It includes all information of SUMMARY and OPERATOR, too. > DETAIL shows very vectorization information. > It includes all information of SUMMARY, OPERATOR, and EXPRESSION too. > The optional clause defaults are not ONLY and SUMMARY. > -------------------------------------------------------------------------= -------------------------- > Here are some examples: > EXPLAIN VECTORIZATION example: > (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization sec= tions) > Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION S= UMMARY. > Under Reducer 3=E2=80=99s "Reduce Vectorization:" you=E2=80=99ll see > notVectorizedReason: Aggregation Function UDF avg parameter expression fo= r GROUPBY operator: Data type struct of = Column\[VALUE._col2\] not supported > For Reducer 2=E2=80=99s "Reduce Vectorization:" you=E2=80=99ll see "group= ByVectorOutput:": "false" which says a node has a GROUP BY with an AVG or s= ome other aggregator that outputs a non-PRIMITIVE type (e.g. STRUCT) and al= l downstream operators are row-mode. I.e. not vector output. > If "usesVectorUDFAdaptor:": "false" were true, it would say there was at = least one vectorized expression is using VectorUDFAdaptor. > And, "allNative:": "false" will be true when all operators are native. T= oday, GROUP BY and FILE SINK are not native. MAP JOIN and REDUCE SINK are = conditionally native. FILTER and SELECT are native. > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > ... > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > ... > Vertices: > Map 1=20 > Map Operator Tree: > TableScan > alias: alltypesorc > Statistics: Num rows: 12288 Data size: 36696 Basic stat= s: COMPLETE Column stats: COMPLETE > Select Operator > expressions: cint (type: int) > outputColumnNames: cint > Statistics: Num rows: 12288 Data size: 36696 Basic st= ats: COMPLETE Column stats: COMPLETE > Group By Operator > keys: cint (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 5775 Data size: 17248 Basic s= tats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 5775 Data size: 17248 Basic= stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: hive.vectorized.use.vectorized.inpu= t.format IS true > groupByVectorOutput: true > inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInp= utFormat > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reducer 2=20 > Execution mode: vectorized, llap > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.ena= bled IS true, hive.execution.engine tez IN [tez, spark] IS true > groupByVectorOutput: false > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reduce Operator Tree: > Group By Operator > keys: KEY._col0 (type: int) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 5775 Data size: 17248 Basic stats: = COMPLETE Column stats: COMPLETE > Group By Operator > aggregations: sum(_col0), count(_col0), avg(_col0), std= (_col0) > mode: hash > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 1 Data size: 172 Basic stats: COM= PLETE Column stats: COMPLETE > Reduce Output Operator > sort order:=20 > Statistics: Num rows: 1 Data size: 172 Basic stats: C= OMPLETE Column stats: COMPLETE > value expressions: _col0 (type: bigint), _col1 (type:= bigint), _col2 (type: struct), _col3 (t= ype: struct) > Reducer 3=20 > Execution mode: llap > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.ena= bled IS true, hive.execution.engine tez IN [tez, spark] IS true > notVectorizedReason: Aggregation Function UDF avg paramet= er expression for GROUPBY operator: Data type struct of Column[VALUE._col2] not supported > vectorized: false > Reduce Operator Tree: > Group By Operator > aggregations: sum(VALUE._col0), count(VALUE._col1), avg(V= ALUE._col2), std(VALUE._col3) > mode: mergepartial > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLE= TE Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 32 Basic stats: COMP= LETE Column stats: COMPLETE > table: > input format: org.apache.hadoop.mapred.SequenceFile= InputFormat > output format: org.apache.hadoop.hive.ql.io.HiveSeq= uenceFileOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpl= eSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink=20 > {code} > EXPLAIN VECTORIZATION OPERATOR > Notice the added TableScan Vectorization, Select Vectorization, Group By= Vectorization, Map Join Vectorizatin, Reduce Sink Vectorization sections i= n this example. > Notice the nativeConditionsMet detail on why Reduce Vectorization is nati= ve. > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > #### A masked pattern was here #### > Edges: > Map 2 <- Map 1 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > #### A masked pattern was here #### > Vertices: > Map 1=20 > Map Operator Tree: > TableScan > alias: a > Statistics: Num rows: 3 Data size: 294 Basic stats: COM= PLETE Column stats: NONE > TableScan Vectorization: > native: true > projectedOutputColumns: [0, 1] > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicate: c2 is not null (type: boolean) > Statistics: Num rows: 3 Data size: 294 Basic stats: C= OMPLETE Column stats: NONE > Select Operator > expressions: c1 (type: int), c2 (type: char(10)) > outputColumnNames: _col0, _col1 > Select Vectorization: > className: VectorSelectOperator > native: true > projectedOutputColumns: [0, 1] > Statistics: Num rows: 3 Data size: 294 Basic stats:= COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col1 (type: char(20)) > sort order: + > Map-reduce partition columns: _col1 (type: char(2= 0)) > Reduce Sink Vectorization: > className: VectorReduceSinkStringOperator > native: true > nativeConditionsMet: hive.vectorized.executio= n.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark]= IS true, Not ACID UPDATE or DELETE IS true, No buckets IS true, No TopN IS= true, Uniform Hash IS true, No DISTINCT columns IS true, BinarySortableSer= De for keys IS true, LazyBinarySerDe for values IS true > Statistics: Num rows: 3 Data size: 294 Basic stat= s: COMPLETE Column stats: NONE > value expressions: _col0 (type: int) > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: hive.vectorized.use.vectorized.inpu= t.format IS true > groupByVectorOutput: true > inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInp= utFormat > allNative: true > usesVectorUDFAdaptor: false > vectorized: true > Map 2=20 > Map Operator Tree: > TableScan > alias: b > Statistics: Num rows: 3 Data size: 324 Basic stats: COM= PLETE Column stats: NONE > TableScan Vectorization: > native: true > projectedOutputColumns: [0, 1] > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicate: c2 is not null (type: boolean) > Statistics: Num rows: 3 Data size: 324 Basic stats: C= OMPLETE Column stats: NONE > Select Operator > expressions: c1 (type: int), c2 (type: char(20)) > outputColumnNames: _col0, _col1 > Select Vectorization: > className: VectorSelectOperator > native: true > projectedOutputColumns: [0, 1] > Statistics: Num rows: 3 Data size: 324 Basic stats:= COMPLETE Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col1 (type: char(20)) > 1 _col1 (type: char(20)) > Map Join Vectorization: > className: VectorMapJoinInnerStringOperator > native: true > nativeConditionsMet: hive.vectorized.executio= n.mapjoin.native.enabled IS true, hive.execution.engine tez IN [tez, spark]= IS true, One MapJoin Condition IS true, No nullsafe IS true, Supports Key = Types IS true, Not empty key IS true, When Fast Hash Table, then requires n= o Hybrid Hash Join IS true, Small table vectorizes IS true > outputColumnNames: _col0, _col1, _col2, _col3 > input vertices: > 0 Map 1 > Statistics: Num rows: 3 Data size: 323 Basic stat= s: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Reduce Sink Vectorization: > className: VectorReduceSinkOperator > native: false > nativeConditionsMet: hive.vectorized.execut= ion.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spar= k] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS true, No TopN = IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true,= LazyBinarySerDe for values IS true > nativeConditionsNotMet: Uniform Hash IS fal= se > Statistics: Num rows: 3 Data size: 323 Basic st= ats: COMPLETE Column stats: NONE > value expressions: _col1 (type: char(10)), _col= 2 (type: int), _col3 (type: char(20)) > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: hive.vectorized.use.vectorized.inpu= t.format IS true > groupByVectorOutput: true > inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInp= utFormat > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reducer 3=20 > Execution mode: vectorized, llap > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.ena= bled IS true, hive.execution.engine tez IN [tez, spark] IS true > groupByVectorOutput: true > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reduce Operator Tree: > Select Operator > expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 = (type: char(10)), VALUE._col1 (type: int), VALUE._col2 (type: char(20)) > outputColumnNames: _col0, _col1, _col2, _col3 > Select Vectorization: > className: VectorSelectOperator > native: true > projectedOutputColumns: [0, 1, 2, 3] > Statistics: Num rows: 3 Data size: 323 Basic stats: COMPL= ETE Column stats: NONE > File Output Operator > compressed: false > File Sink Vectorization: > className: VectorFileSinkOperator > native: false > Statistics: Num rows: 3 Data size: 323 Basic stats: COM= PLETE Column stats: NONE > table: > input format: org.apache.hadoop.mapred.SequenceFile= InputFormat > output format: org.apache.hadoop.hive.ql.io.HiveSeq= uenceFileOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpl= eSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > EXPLAIN VECTORIZATION EXPRESSION > Notice the predicateExpression in this example. > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > #### A masked pattern was here #### > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > #### A masked pattern was here #### > Vertices: > Map 1=20 > Map Operator Tree: > TableScan > alias: vector_interval_2 > Statistics: Num rows: 2 Data size: 788 Basic stats: COM= PLETE Column stats: NONE > TableScan Vectorization: > native: true > projectedOutputColumns: [0, 1, 2, 3, 4, 5] > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicateExpression: FilterExprAndExpr(children: = FilterTimestampScalarEqualTimestampColumn(val 2001-01-01 01:02:03.0, col 6)= (children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000)= -> 6:timestamp) -> boolean, FilterTimestampScalarNotEqualTimestampColumn(v= al 2001-01-01 01:02:03.0, col 6)(children: DateColAddIntervalDayTimeScalar(= col 1, val 0 01:02:04.000000000) -> 6:timestamp) -> boolean, FilterTimestam= pScalarLessEqualTimestampColumn(val 2001-01-01 01:02:03.0, col 6)(children:= DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000) -> 6:time= stamp) -> boolean, FilterTimestampScalarLessTimestampColumn(val 2001-01-01 = 01:02:03.0, col 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 0= 1:02:04.000000000) -> 6:timestamp) -> boolean, FilterTimestampScalarGreater= EqualTimestampColumn(val 2001-01-01 01:02:03.0, col 6)(children: DateColSub= tractIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000) -> 6:timestamp)= -> boolean, FilterTimestampScalarGreaterTimestampColumn(val 2001-01-01 01:= 02:03.0, col 6)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0= 01:02:04.000000000) -> 6:timestamp) -> boolean, FilterTimestampColEqualTim= estampScalar(col 6, val 2001-01-01 01:02:03.0)(children: DateColAddInterval= DayTimeScalar(col 1, val 0 01:02:03.000000000) -> 6:timestamp) -> boolean, = FilterTimestampColNotEqualTimestampScalar(col 6, val 2001-01-01 01:02:03.0)= (children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000)= -> 6:timestamp) -> boolean, FilterTimestampColGreaterEqualTimestampScalar(= col 6, val 2001-01-01 01:02:03.0)(children: DateColAddIntervalDayTimeScalar= (col 1, val 0 01:02:03.000000000) -> 6:timestamp) -> boolean, FilterTimesta= mpColGreaterTimestampScalar(col 6, val 2001-01-01 01:02:03.0)(children: Dat= eColAddIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000) -> 6:timestam= p) -> boolean, FilterTimestampColLessEqualTimestampScalar(col 6, val 2001-0= 1-01 01:02:03.0)(children: DateColSubtractIntervalDayTimeScalar(col 1, val = 0 01:02:03.000000000) -> 6:timestamp) -> boolean, FilterTimestampColLessTim= estampScalar(col 6, val 2001-01-01 01:02:03.0)(children: DateColSubtractInt= ervalDayTimeScalar(col 1, val 0 01:02:04.000000000) -> 6:timestamp) -> bool= ean, FilterTimestampColEqualTimestampColumn(col 0, col 6)(children: DateCol= AddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000) -> 6:timestamp) -= > boolean, FilterTimestampColNotEqualTimestampColumn(col 0, col 6)(children= : DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000) -> 6:tim= estamp) -> boolean, FilterTimestampColLessEqualTimestampColumn(col 0, col 6= )(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000= ) -> 6:timestamp) -> boolean, FilterTimestampColLessTimestampColumn(col 0, = col 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:04.0000= 00000) -> 6:timestamp) -> boolean, FilterTimestampColGreaterEqualTimestampC= olumn(col 0, col 6)(children: DateColSubtractIntervalDayTimeScalar(col 1, v= al 0 01:02:03.000000000) -> 6:timestamp) -> boolean, FilterTimestampColGrea= terTimestampColumn(col 0, col 6)(children: DateColSubtractIntervalDayTimeSc= alar(col 1, val 0 01:02:04.000000000) -> 6:timestamp) -> boolean) -> boolea= n > predicate: ((2001-01-01 01:02:03.0 =3D (dt + 0 01:02:= 03.000000000)) and (2001-01-01 01:02:03.0 <> (dt + 0 01:02:04.000000000)) a= nd (2001-01-01 01:02:03.0 <=3D (dt + 0 01:02:03.000000000)) and (2001-01-01= 01:02:03.0 < (dt + 0 01:02:04.000000000)) and (2001-01-01 01:02:03.0 >=3D = (dt - 0 01:02:03.000000000)) and (2001-01-01 01:02:03.0 > (dt - 0 01:02:04.= 000000000)) and ((dt + 0 01:02:03.000000000) =3D 2001-01-01 01:02:03.0) and= ((dt + 0 01:02:04.000000000) <> 2001-01-01 01:02:03.0) and ((dt + 0 01:02:= 03.000000000) >=3D 2001-01-01 01:02:03.0) and ((dt + 0 01:02:04.000000000) = > 2001-01-01 01:02:03.0) and ((dt - 0 01:02:03.000000000) <=3D 2001-01-01 0= 1:02:03.0) and ((dt - 0 01:02:04.000000000) < 2001-01-01 01:02:03.0) and (t= s =3D (dt + 0 01:02:03.000000000)) and (ts <> (dt + 0 01:02:04.000000000)) = and (ts <=3D (dt + 0 01:02:03.000000000)) and (ts < (dt + 0 01:02:04.000000= 000)) and (ts >=3D (dt - 0 01:02:03.000000000)) and (ts > (dt - 0 01:02:04.= 000000000))) (type: boolean) > Statistics: Num rows: 1 Data size: 394 Basic stats: C= OMPLETE Column stats: NONE > Select Operator > expressions: ts (type: timestamp) > outputColumnNames: _col0 > Select Vectorization: > className: VectorSelectOperator > native: true > projectedOutputColumns: [0] > Statistics: Num rows: 1 Data size: 394 Basic stats:= COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: timestamp) > sort order: + > Reduce Sink Vectorization: > className: VectorReduceSinkOperator > native: false > nativeConditionsMet: hive.vectorized.executio= n.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark]= IS true, Not ACID UPDATE or DELETE IS true, No buckets IS true, No TopN IS= true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, L= azyBinarySerDe for values IS true > nativeConditionsNotMet: Uniform Hash IS false > Statistics: Num rows: 1 Data size: 394 Basic stat= s: COMPLETE Column stats: NONE > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: hive.vectorized.use.vectorized.inpu= t.format IS true > groupByVectorOutput: true > inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInp= utFormat > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reducer 2=20 > ...=20 > {code} > The standard @Explain Annotation Type is used. A new 'vectorization' ann= otation marks each new class and method. > Works for FORMATTED, like other non-vectorization EXPLAIN variations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)