Return-Path: X-Original-To: apmail-hive-commits-archive@www.apache.org Delivered-To: apmail-hive-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 67981180AF for ; Wed, 16 Dec 2015 09:19:39 +0000 (UTC) Received: (qmail 99539 invoked by uid 500); 16 Dec 2015 09:19:39 -0000 Delivered-To: apmail-hive-commits-archive@hive.apache.org Received: (qmail 99496 invoked by uid 500); 16 Dec 2015 09:19:39 -0000 Mailing-List: contact commits-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-dev@hive.apache.org Delivered-To: mailing list commits@hive.apache.org Received: (qmail 99485 invoked by uid 99); 16 Dec 2015 09:19:39 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Dec 2015 09:19:39 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id F1F3ADFD5B; Wed, 16 Dec 2015 09:19:38 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: mmccline@apache.org To: commits@hive.apache.org Message-Id: <699fc64f5ead400cb41c829be58b927d@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: hive git commit: HIVE-12435 SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled. (Matt McCline, reviewed by Prasanth J) Date: Wed, 16 Dec 2015 09:19:38 +0000 (UTC) Repository: hive Updated Branches: refs/heads/branch-1 e2c8bfa12 -> 26728a8a3 HIVE-12435 SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled. (Matt McCline, reviewed by Prasanth J) Project: http://git-wip-us.apache.org/repos/asf/hive/repo Commit: http://git-wip-us.apache.org/repos/asf/hive/commit/26728a8a Tree: http://git-wip-us.apache.org/repos/asf/hive/tree/26728a8a Diff: http://git-wip-us.apache.org/repos/asf/hive/diff/26728a8a Branch: refs/heads/branch-1 Commit: 26728a8a32f8259753e953d9c1a801d949aff5e3 Parents: e2c8bfa Author: Matt McCline Authored: Mon Dec 14 14:12:48 2015 -0800 Committer: Matt McCline Committed: Wed Dec 16 01:19:27 2015 -0800 ---------------------------------------------------------------------- .../test/resources/testconfiguration.properties | 1 + .../resources/testconfiguration.properties.orig | 1190 ++++++++++++++++++ .../ql/exec/vector/VectorizedBatchUtil.java | 13 +- .../exec/vector/VectorizedBatchUtil.java.orig | 707 +++++++++++ .../ql/exec/vector/udf/VectorUDFArgDesc.java | 12 + .../clientpositive/vector_when_case_null.q | 14 + .../tez/vector_select_null2.q.out | 95 ++ .../tez/vector_when_case_null.q.out | 96 ++ .../clientpositive/vector_when_case_null.q.out | 89 ++ 9 files changed, 2215 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/hive/blob/26728a8a/itests/src/test/resources/testconfiguration.properties ---------------------------------------------------------------------- diff --git a/itests/src/test/resources/testconfiguration.properties b/itests/src/test/resources/testconfiguration.properties index 03b07ce..1c8a80d 100644 --- a/itests/src/test/resources/testconfiguration.properties +++ b/itests/src/test/resources/testconfiguration.properties @@ -267,6 +267,7 @@ minitez.query.files.shared=acid_globallimit.q,\ vector_varchar_4.q,\ vector_varchar_mapjoin1.q,\ vector_varchar_simple.q,\ + vector_when_case_null.q,\ vectorization_0.q,\ vectorization_1.q,\ vectorization_10.q,\ http://git-wip-us.apache.org/repos/asf/hive/blob/26728a8a/itests/src/test/resources/testconfiguration.properties.orig ---------------------------------------------------------------------- diff --git a/itests/src/test/resources/testconfiguration.properties.orig b/itests/src/test/resources/testconfiguration.properties.orig new file mode 100644 index 0000000..03b07ce --- /dev/null +++ b/itests/src/test/resources/testconfiguration.properties.orig @@ -0,0 +1,1190 @@ +# NOTE: files should be listed in alphabetical order +minimr.query.files=auto_sortmerge_join_16.q,\ + bucket4.q,\ + bucket5.q,\ + bucket6.q,\ + bucket_many.q,\ + bucket_num_reducers.q,\ + bucket_num_reducers2.q,\ + bucketizedhiveinputformat.q,\ + bucketmapjoin6.q,\ + bucketmapjoin7.q,\ + constprog_partitioner.q,\ + disable_merge_for_bucketing.q,\ + empty_dir_in_table.q,\ + exchgpartition2lel.q,\ + external_table_with_space_in_location_path.q,\ + file_with_header_footer.q,\ + groupby2.q,\ + import_exported_table.q,\ + index_bitmap3.q,\ + index_bitmap_auto.q,\ + infer_bucket_sort_bucketed_table.q,\ + infer_bucket_sort_dyn_part.q,\ + infer_bucket_sort_map_operators.q,\ + infer_bucket_sort_merge.q,\ + infer_bucket_sort_num_buckets.q,\ + infer_bucket_sort_reducers_power_two.q,\ + input16_cc.q,\ + insert_dir_distcp.q,\ + join1.q,\ + join_acid_non_acid.q,\ + leftsemijoin_mr.q,\ + list_bucket_dml_10.q,\ + load_fs2.q,\ + load_hdfs_file_with_space_in_the_name.q,\ + non_native_window_udf.q, \ + orc_merge_diff_fs.q,\ + optrstat_groupby.q,\ + parallel_orderby.q,\ + ql_rewrite_gbtoidx.q,\ + ql_rewrite_gbtoidx_cbo_1.q,\ + ql_rewrite_gbtoidx_cbo_2.q,\ + quotedid_smb.q,\ + reduce_deduplicate.q,\ + remote_script.q,\ + root_dir_external_table.q,\ + schemeAuthority.q,\ + schemeAuthority2.q,\ + scriptfile1.q,\ + scriptfile1_win.q,\ + skewjoin_onesideskew.q,\ + smb_mapjoin_8.q,\ + stats_counter.q,\ + stats_counter_partitioned.q,\ + table_nonprintable.q,\ + temp_table_external.q,\ + truncate_column_buckets.q,\ + uber_reduce.q,\ + udf_using.q + +minitez.query.files.shared=acid_globallimit.q,\ + alter_merge_2_orc.q,\ + alter_merge_orc.q,\ + alter_merge_stats_orc.q,\ + auto_join0.q,\ + auto_join1.q,\ + bucket2.q,\ + bucket3.q,\ + bucket4.q,\ + cbo_gby.q,\ + cbo_gby_empty.q,\ + cbo_join.q,\ + cbo_limit.q,\ + cbo_semijoin.q,\ + cbo_simple_select.q,\ + cbo_stats.q,\ + cbo_subq_exists.q,\ + cbo_subq_in.q,\ + cbo_subq_not_in.q,\ + cbo_udf_udaf.q,\ + cbo_union.q,\ + cbo_views.q,\ + cbo_windowing.q,\ + correlationoptimizer1.q,\ + count.q,\ + create_merge_compressed.q,\ + cross_join.q,\ + cross_product_check_1.q,\ + cross_product_check_2.q,\ + ctas.q,\ + custom_input_output_format.q,\ + delete_all_non_partitioned.q,\ + delete_all_partitioned.q,\ + delete_orig_table.q,\ + delete_tmp_table.q,\ + delete_where_no_match.q,\ + delete_where_non_partitioned.q,\ + delete_where_partitioned.q,\ + delete_whole_partition.q,\ + disable_merge_for_bucketing.q,\ + dynpart_sort_opt_vectorization.q,\ + dynpart_sort_optimization.q,\ + dynpart_sort_optimization2.q,\ + enforce_order.q,\ + filter_join_breaktask.q,\ + filter_join_breaktask2.q,\ + groupby1.q,\ + groupby2.q,\ + groupby3.q,\ + having.q,\ + identity_project_remove_skip.q\ + insert1.q,\ + insert_into1.q,\ + insert_into2.q,\ + insert_orig_table.q,\ + insert_values_dynamic_partitioned.q,\ + insert_values_non_partitioned.q,\ + insert_values_orig_table.q\ + insert_values_partitioned.q,\ + insert_values_tmp_table.q,\ + insert_update_delete.q,\ + join0.q,\ + join1.q,\ + join_nullsafe.q,\ + leftsemijoin.q,\ + limit_pushdown.q,\ + load_dyn_part1.q,\ + load_dyn_part2.q,\ + load_dyn_part3.q,\ + mapjoin_mapjoin.q,\ + mapreduce1.q,\ + mapreduce2.q,\ + merge1.q,\ + merge2.q,\ + mergejoin.q,\ + metadataonly1.q,\ + metadata_only_queries.q,\ + optimize_nullscan.q,\ + orc_analyze.q,\ + orc_merge1.q,\ + orc_merge2.q,\ + orc_merge3.q,\ + orc_merge4.q,\ + orc_merge5.q,\ + orc_merge6.q,\ + orc_merge7.q,\ + orc_merge8.q,\ + orc_merge9.q,\ + orc_merge10.q,\ + orc_merge11.q,\ + orc_merge_incompat1.q,\ + orc_merge_incompat2.q,\ + orc_vectorization_ppd.q,\ + parallel.q,\ + ptf.q,\ + ptf_matchpath.q,\ + ptf_streaming.q,\ + sample1.q,\ + selectDistinctStar.q,\ + script_env_var1.q,\ + script_env_var2.q,\ + script_pipe.q,\ + scriptfile1.q,\ + select_dummy_source.q,\ + skewjoin.q,\ + stats_counter.q,\ + stats_counter_partitioned.q,\ + stats_noscan_1.q,\ + stats_only_null.q,\ + subquery_exists.q,\ + subquery_in.q,\ + temp_table.q,\ + transform1.q,\ + transform2.q,\ + transform_ppr1.q,\ + transform_ppr2.q,\ + union2.q,\ + union3.q,\ + union4.q,\ + union5.q,\ + union6.q,\ + union7.q,\ + union8.q,\ + union9.q,\ + unionDistinct_1.q,\ + unionDistinct_2.q,\ + union_fast_stats.q,\ + update_after_multiple_inserts.q,\ + update_all_non_partitioned.q,\ + update_all_partitioned.q,\ + update_all_types.q,\ + update_orig_table.q,\ + update_tmp_table.q,\ + update_where_no_match.q,\ + update_where_non_partitioned.q,\ + update_where_partitioned.q,\ + update_two_cols.q,\ + vector_acid3.q,\ + vector_aggregate_9.q,\ + vector_auto_smb_mapjoin_14.q,\ + vector_between_in.q,\ + vector_between_columns.q,\ + vector_binary_join_groupby.q,\ + vector_bucket.q,\ + vector_char_cast.q,\ + vector_cast_constant.q,\ + vector_char_2.q,\ + vector_char_4.q,\ + vector_char_mapjoin1.q,\ + vector_char_simple.q,\ + vector_coalesce.q,\ + vector_coalesce_2.q,\ + vector_count_distinct.q,\ + vector_data_types.q,\ + vector_date_1.q,\ + vector_decimal_1.q,\ + vector_decimal_10_0.q,\ + vector_decimal_2.q,\ + vector_decimal_3.q,\ + vector_decimal_4.q,\ + vector_decimal_5.q,\ + vector_decimal_6.q,\ + vector_decimal_aggregate.q,\ + vector_decimal_cast.q,\ + vector_decimal_expressions.q,\ + vector_decimal_mapjoin.q,\ + vector_decimal_math_funcs.q,\ + vector_decimal_precision.q,\ + vector_decimal_round.q,\ + vector_decimal_round_2.q,\ + vector_decimal_trailing.q,\ + vector_decimal_udf.q,\ + vector_decimal_udf2.q,\ + vector_distinct_2.q,\ + vector_elt.q,\ + vector_groupby_3.q,\ + vector_groupby_reduce.q,\ + vector_grouping_sets.q,\ + vector_if_expr.q,\ + vector_inner_join.q,\ + vector_interval_1.q,\ + vector_interval_2.q,\ + vector_join30.q,\ + vector_join_filters.q,\ + vector_join_nulls.q,\ + vector_left_outer_join.q,\ + vector_left_outer_join2.q,\ + vector_leftsemi_mapjoin.q,\ + vector_mapjoin_reduce.q,\ + vector_mr_diff_schema_alias.q,\ + vector_multi_insert.q,\ + vector_non_string_partition.q,\ + vector_nullsafe_join.q,\ + vector_null_projection.q,\ + vector_orderby_5.q,\ + vector_outer_join0.q,\ + vector_outer_join1.q,\ + vector_outer_join2.q,\ + vector_outer_join3.q,\ + vector_outer_join4.q,\ + vector_outer_join5.q,\ + vector_outer_join6.q,\ + vector_partition_diff_num_cols.q,\ + vector_partitioned_date_time.q,\ + vector_reduce_groupby_decimal.q,\ + vector_string_concat.q,\ + vector_varchar_4.q,\ + vector_varchar_mapjoin1.q,\ + vector_varchar_simple.q,\ + vectorization_0.q,\ + vectorization_1.q,\ + vectorization_10.q,\ + vectorization_11.q,\ + vectorization_12.q,\ + vectorization_13.q,\ + vectorization_14.q,\ + vectorization_15.q,\ + vectorization_16.q,\ + vectorization_17.q,\ + vectorization_2.q,\ + vectorization_3.q,\ + vectorization_4.q,\ + vectorization_5.q,\ + vectorization_6.q,\ + vectorization_7.q,\ + vectorization_8.q,\ + vectorization_9.q,\ + vectorization_decimal_date.q,\ + vectorization_div0.q,\ + vectorization_limit.q,\ + vectorization_nested_udf.q,\ + vectorization_not.q,\ + vectorization_part.q,\ + vectorization_part_project.q,\ + vectorization_pushdown.q,\ + vectorization_short_regress.q,\ + vectorized_bucketmapjoin1.q,\ + vectorized_case.q,\ + vectorized_casts.q,\ + vectorized_context.q,\ + vectorized_date_funcs.q,\ + vectorized_distinct_gby.q,\ + vectorized_mapjoin.q,\ + vectorized_math_funcs.q,\ + vectorized_nested_mapjoin.q,\ + vectorized_parquet.q,\ + vectorized_ptf.q,\ + vectorized_rcfile_columnar.q,\ + vectorized_shufflejoin.q,\ + vectorized_string_funcs.q,\ + vectorized_timestamp_funcs.q,\ + auto_sortmerge_join_1.q,\ + auto_sortmerge_join_10.q,\ + auto_sortmerge_join_11.q,\ + auto_sortmerge_join_12.q,\ + auto_sortmerge_join_13.q,\ + auto_sortmerge_join_14.q,\ + auto_sortmerge_join_15.q,\ + auto_sortmerge_join_16.q,\ + auto_sortmerge_join_2.q,\ + auto_sortmerge_join_3.q,\ + auto_sortmerge_join_4.q,\ + auto_sortmerge_join_5.q,\ + auto_sortmerge_join_7.q,\ + auto_sortmerge_join_8.q,\ + auto_sortmerge_join_9.q,\ + auto_join30.q,\ + auto_join21.q,\ + auto_join29.q,\ + auto_join_filters.q + + +minitez.query.files=bucket_map_join_tez1.q,\ + bucket_map_join_tez2.q,\ + dynamic_partition_pruning.q,\ + dynamic_partition_pruning_2.q,\ + explainuser_1.q,\ + explainuser_2.q,\ + hybridgrace_hashjoin_1.q,\ + hybridgrace_hashjoin_2.q,\ + mapjoin_decimal.q,\ + lvj_mapjoin.q, \ + mergejoin_3way.q,\ + mrr.q,\ + orc_ppd_basic.q,\ + orc_merge_diff_fs.q,\ + tez_bmj_schema_evolution.q,\ + tez_dml.q,\ + tez_fsstat.q,\ + tez_insert_overwrite_local_directory_1.q,\ + tez_dynpart_hashjoin_1.q,\ + tez_dynpart_hashjoin_2.q,\ + tez_vector_dynpart_hashjoin_1.q,\ + tez_vector_dynpart_hashjoin_2.q,\ + tez_join_hash.q,\ + tez_join_result_complex.q,\ + tez_join_tests.q,\ + tez_joins_explain.q,\ + tez_schema_evolution.q,\ + tez_self_join.q,\ + tez_union.q,\ + tez_union2.q,\ + tez_union_dynamic_partition.q,\ + tez_union_view.q,\ + tez_union_decimal.q,\ + tez_union_group_by.q,\ + tez_union_with_udf.q,\ + tez_smb_main.q,\ + tez_smb_1.q,\ + tez_smb_empty.q,\ + vector_join_part_col_char.q,\ + vectorized_dynamic_partition_pruning.q,\ + tez_multi_union.q,\ + tez_join.q,\ + tez_union_multiinsert.q + +encrypted.query.files=encryption_join_unencrypted_tbl.q,\ + encryption_insert_partition_static.q,\ + encryption_insert_partition_dynamic.q,\ + encryption_join_with_different_encryption_keys.q,\ + encryption_select_read_only_encrypted_tbl.q,\ + encryption_select_read_only_unencrypted_tbl.q,\ + encryption_load_data_to_encrypted_tables.q, \ + encryption_unencrypted_nonhdfs_external_tables.q \ + encryption_move_tbl.q \ + encryption_drop_table.q \ + encryption_insert_values.q \ + encryption_drop_view.q \ + encryption_drop_partition.q \ + encryption_with_trash.q + +beeline.positive.exclude=add_part_exist.q,\ + alter1.q,\ + alter2.q,\ + alter4.q,\ + alter5.q,\ + alter_rename_partition.q,\ + alter_rename_partition_authorization.q,\ + archive.q,\ + archive_corrupt.q,\ + archive_mr_1806.q,\ + archive_multi.q,\ + archive_multi_mr_1806.q,\ + authorization_1.q,\ + authorization_2.q,\ + authorization_4.q,\ + authorization_5.q,\ + authorization_6.q,\ + authorization_7.q,\ + ba_table1.q,\ + ba_table2.q,\ + ba_table3.q,\ + ba_table_udfs.q,\ + binary_table_bincolserde.q,\ + binary_table_colserde.q,\ + cluster.q,\ + columnarserde_create_shortcut.q,\ + combine2.q,\ + constant_prop.q,\ + create_nested_type.q,\ + create_or_replace_view.q,\ + create_struct_table.q,\ + create_union_table.q,\ + database.q,\ + database_location.q,\ + database_properties.q,\ + ddltime.q,\ + describe_database_json.q,\ + drop_database_removes_partition_dirs.q,\ + escape1.q,\ + escape2.q,\ + exim_00_nonpart_empty.q,\ + exim_01_nonpart.q,\ + exim_02_00_part_empty.q,\ + exim_02_part.q,\ + exim_03_nonpart_over_compat.q,\ + exim_04_all_part.q,\ + exim_04_evolved_parts.q,\ + exim_05_some_part.q,\ + exim_06_one_part.q,\ + exim_07_all_part_over_nonoverlap.q,\ + exim_08_nonpart_rename.q,\ + exim_09_part_spec_nonoverlap.q,\ + exim_10_external_managed.q,\ + exim_11_managed_external.q,\ + exim_12_external_location.q,\ + exim_13_managed_location.q,\ + exim_14_managed_location_over_existing.q,\ + exim_15_external_part.q,\ + exim_16_part_external.q,\ + exim_17_part_managed.q,\ + exim_18_part_external.q,\ + exim_19_00_part_external_location.q,\ + exim_19_part_external_location.q,\ + exim_20_part_managed_location.q,\ + exim_21_export_authsuccess.q,\ + exim_22_import_exist_authsuccess.q,\ + exim_23_import_part_authsuccess.q,\ + exim_24_import_nonexist_authsuccess.q,\ + global_limit.q,\ + groupby_complex_types.q,\ + groupby_complex_types_multi_single_reducer.q,\ + index_auth.q,\ + index_auto.q,\ + index_auto_empty.q,\ + index_bitmap.q,\ + index_bitmap1.q,\ + index_bitmap2.q,\ + index_bitmap3.q,\ + index_bitmap_auto.q,\ + index_bitmap_rc.q,\ + index_compact.q,\ + index_compact_1.q,\ + index_compact_2.q,\ + index_compact_3.q,\ + index_stale_partitioned.q,\ + init_file.q,\ + input16.q,\ + input16_cc.q,\ + input46.q,\ + input_columnarserde.q,\ + input_dynamicserde.q,\ + input_lazyserde.q,\ + input_testxpath3.q,\ + input_testxpath4.q,\ + insert2_overwrite_partitions.q,\ + insertexternal1.q,\ + join_thrift.q,\ + lateral_view.q,\ + load_binary_data.q,\ + load_exist_part_authsuccess.q,\ + load_nonpart_authsuccess.q,\ + load_part_authsuccess.q,\ + loadpart_err.q,\ + lock1.q,\ + lock2.q,\ + lock3.q,\ + lock4.q,\ + merge_dynamic_partition.q,\ + multi_insert.q,\ + multi_insert_move_tasks_share_dependencies.q,\ + null_column.q,\ + ppd_clusterby.q,\ + query_with_semi.q,\ + rename_column.q,\ + sample6.q,\ + sample_islocalmode_hook.q,\ + set_processor_namespaces.q,\ + show_tables.q,\ + source.q,\ + split_sample.q,\ + str_to_map.q,\ + transform1.q,\ + udaf_collect_set.q,\ + udaf_context_ngrams.q,\ + udaf_histogram_numeric.q,\ + udaf_ngrams.q,\ + udaf_percentile_approx.q,\ + udf_array.q,\ + udf_bitmap_and.q,\ + udf_bitmap_or.q,\ + udf_explode.q,\ + udf_format_number.q,\ + udf_map.q,\ + udf_map_keys.q,\ + udf_map_values.q,\ + udf_max.q,\ + udf_min.q,\ + udf_named_struct.q,\ + udf_percentile.q,\ + udf_printf.q,\ + udf_sentences.q,\ + udf_sort_array.q,\ + udf_split.q,\ + udf_struct.q,\ + udf_substr.q,\ + udf_translate.q,\ + udf_union.q,\ + udf_xpath.q,\ + udtf_stack.q,\ + view.q,\ + virtual_column.q + +minimr.query.negative.files=cluster_tasklog_retrieval.q,\ + file_with_header_footer_negative.q,\ + local_mapred_error_cache.q,\ + mapreduce_stack_trace.q,\ + mapreduce_stack_trace_hadoop20.q,\ + mapreduce_stack_trace_turnoff.q,\ + mapreduce_stack_trace_turnoff_hadoop20.q,\ + minimr_broken_pipe.q,\ + table_nonprintable_negative.q,\ + udf_local_resource.q + +# tests are sorted use: perl -pe 's@\\\s*\n@ @g' testconfiguration.properties \ +# | awk -F= '/spark.query.files/{print $2}' | perl -pe 's@.q *, *@\n@g' \ +# | egrep -v '^ *$' | sort -V | uniq | perl -pe 's@\n@.q, \\\n@g' | perl -pe 's@^@ @g' +spark.query.files=add_part_multiple.q, \ + alter_merge_orc.q, \ + alter_merge_stats_orc.q, \ + annotate_stats_join.q, \ + auto_join0.q, \ + auto_join1.q, \ + auto_join10.q, \ + auto_join11.q, \ + auto_join12.q, \ + auto_join13.q, \ + auto_join14.q, \ + auto_join15.q, \ + auto_join16.q, \ + auto_join17.q, \ + auto_join18.q, \ + auto_join18_multi_distinct.q, \ + auto_join19.q, \ + auto_join2.q, \ + auto_join20.q, \ + auto_join21.q, \ + auto_join22.q, \ + auto_join23.q, \ + auto_join24.q, \ + auto_join26.q, \ + auto_join27.q, \ + auto_join28.q, \ + auto_join29.q, \ + auto_join3.q, \ + auto_join30.q, \ + auto_join31.q, \ + auto_join32.q, \ + auto_join4.q, \ + auto_join5.q, \ + auto_join6.q, \ + auto_join7.q, \ + auto_join8.q, \ + auto_join9.q, \ + auto_join_filters.q, \ + auto_join_nulls.q, \ + auto_join_reordering_values.q, \ + auto_join_stats.q, \ + auto_join_stats2.q, \ + auto_join_without_localtask.q, \ + auto_smb_mapjoin_14.q, \ + auto_sortmerge_join_1.q, \ + auto_sortmerge_join_10.q, \ + auto_sortmerge_join_12.q, \ + auto_sortmerge_join_13.q, \ + auto_sortmerge_join_14.q, \ + auto_sortmerge_join_15.q, \ + auto_sortmerge_join_16.q, \ + auto_sortmerge_join_2.q, \ + auto_sortmerge_join_3.q, \ + auto_sortmerge_join_4.q, \ + auto_sortmerge_join_5.q, \ + auto_sortmerge_join_6.q, \ + auto_sortmerge_join_7.q, \ + auto_sortmerge_join_8.q, \ + auto_sortmerge_join_9.q, \ + avro_compression_enabled_native.q, \ + avro_decimal_native.q, \ + avro_joins.q, \ + avro_joins_native.q, \ + bucket2.q, \ + bucket3.q, \ + bucket4.q, \ + bucket_map_join_1.q, \ + bucket_map_join_2.q, \ + bucket_map_join_spark1.q, \ + bucket_map_join_spark2.q, \ + bucket_map_join_spark3.q, \ + bucket_map_join_spark4.q, \ + bucket_map_join_tez1.q, \ + bucket_map_join_tez2.q, \ + bucketmapjoin1.q, \ + bucketmapjoin10.q, \ + bucketmapjoin11.q, \ + bucketmapjoin12.q, \ + bucketmapjoin13.q, \ + bucketmapjoin2.q, \ + bucketmapjoin3.q, \ + bucketmapjoin4.q, \ + bucketmapjoin5.q, \ + bucketmapjoin7.q, \ + bucketmapjoin8.q, \ + bucketmapjoin9.q, \ + bucketmapjoin_negative.q, \ + bucketmapjoin_negative2.q, \ + bucketmapjoin_negative3.q, \ + bucketsortoptimize_insert_2.q, \ + bucketsortoptimize_insert_4.q, \ + bucketsortoptimize_insert_6.q, \ + bucketsortoptimize_insert_7.q, \ + bucketsortoptimize_insert_8.q, \ + cbo_gby.q, \ + cbo_gby_empty.q, \ + cbo_limit.q, \ + cbo_semijoin.q, \ + cbo_simple_select.q, \ + cbo_stats.q, \ + cbo_subq_in.q, \ + cbo_subq_not_in.q, \ + cbo_udf_udaf.q, \ + cbo_union.q, \ + column_access_stats.q, \ + count.q, \ + create_merge_compressed.q, \ + cross_join.q, \ + cross_product_check_1.q, \ + cross_product_check_2.q, \ + ctas.q, \ + custom_input_output_format.q, \ + date_join1.q, \ + date_udf.q, \ + decimal_1_1.q, \ + decimal_join.q, \ + disable_merge_for_bucketing.q, \ + dynamic_rdd_cache.q, \ + enforce_order.q, \ + escape_clusterby1.q, \ + escape_distributeby1.q, \ + escape_orderby1.q, \ + escape_sortby1.q, \ + filter_join_breaktask.q, \ + filter_join_breaktask2.q, \ + groupby1.q, \ + groupby10.q, \ + groupby11.q, \ + groupby2.q, \ + groupby3.q, \ + groupby3_map.q, \ + groupby3_map_multi_distinct.q, \ + groupby3_map_skew.q, \ + groupby3_noskew.q, \ + groupby3_noskew_multi_distinct.q, \ + groupby4.q, \ + groupby7.q, \ + groupby7_map.q, \ + groupby7_map_multi_single_reducer.q, \ + groupby7_map_skew.q, \ + groupby7_noskew.q, \ + groupby7_noskew_multi_single_reducer.q, \ + groupby8.q, \ + groupby8_map.q, \ + groupby8_map_skew.q, \ + groupby8_noskew.q, \ + groupby9.q, \ + groupby_bigdata.q, \ + groupby_complex_types.q, \ + groupby_complex_types_multi_single_reducer.q, \ + groupby_cube1.q, \ + groupby_map_ppr.q, \ + groupby_map_ppr_multi_distinct.q, \ + groupby_multi_insert_common_distinct.q, \ + groupby_multi_single_reducer.q, \ + groupby_multi_single_reducer2.q, \ + groupby_multi_single_reducer3.q, \ + groupby_position.q, \ + groupby_ppr.q, \ + groupby_rollup1.q, \ + groupby_sort_1_23.q, \ + groupby_sort_skew_1_23.q, \ + having.q, \ + identity_project_remove_skip.q, \ + index_auto_self_join.q, \ + innerjoin.q, \ + input12.q, \ + input13.q, \ + input14.q, \ + input17.q, \ + input18.q, \ + input1_limit.q, \ + input_part2.q, \ + insert1.q, \ + insert_into1.q, \ + insert_into2.q, \ + insert_into3.q, \ + join0.q, \ + join1.q, \ + join10.q, \ + join11.q, \ + join12.q, \ + join13.q, \ + join14.q, \ + join15.q, \ + join16.q, \ + join17.q, \ + join18.q, \ + join18_multi_distinct.q, \ + join19.q, \ + join2.q, \ + join20.q, \ + join21.q, \ + join22.q, \ + join23.q, \ + join24.q, \ + join25.q, \ + join26.q, \ + join27.q, \ + join28.q, \ + join29.q, \ + join3.q, \ + join30.q, \ + join31.q, \ + join32.q, \ + join32_lessSize.q, \ + join33.q, \ + join34.q, \ + join35.q, \ + join36.q, \ + join37.q, \ + join38.q, \ + join39.q, \ + join4.q, \ + join40.q, \ + join41.q, \ + join5.q, \ + join6.q, \ + join7.q, \ + join8.q, \ + join9.q, \ + join_1to1.q, \ + join_alt_syntax.q, \ + join_array.q, \ + join_casesensitive.q, \ + join_cond_pushdown_1.q, \ + join_cond_pushdown_2.q, \ + join_cond_pushdown_3.q, \ + join_cond_pushdown_4.q, \ + join_cond_pushdown_unqual1.q, \ + join_cond_pushdown_unqual2.q, \ + join_cond_pushdown_unqual3.q, \ + join_cond_pushdown_unqual4.q, \ + join_empty.q, \ + join_filters_overlap.q, \ + join_hive_626.q, \ + join_literals.q, \ + join_map_ppr.q, \ + join_merge_multi_expressions.q, \ + join_merging.q, \ + join_nullsafe.q, \ + join_rc.q, \ + join_reorder.q, \ + join_reorder2.q, \ + join_reorder3.q, \ + join_reorder4.q, \ + join_star.q, \ + join_thrift.q, \ + join_vc.q, \ + join_view.q, \ + lateral_view_explode2.q, \ + leftsemijoin.q, \ + leftsemijoin_mr.q, \ + limit_partition_metadataonly.q, \ + limit_pushdown.q, \ + list_bucket_dml_2.q, \ + load_dyn_part1.q, \ + load_dyn_part10.q, \ + load_dyn_part11.q, \ + load_dyn_part12.q, \ + load_dyn_part13.q, \ + load_dyn_part14.q, \ + load_dyn_part15.q, \ + load_dyn_part2.q, \ + load_dyn_part3.q, \ + load_dyn_part4.q, \ + load_dyn_part5.q, \ + load_dyn_part6.q, \ + load_dyn_part7.q, \ + load_dyn_part8.q, \ + load_dyn_part9.q, \ + louter_join_ppr.q, \ + mapjoin1.q, \ + mapjoin_addjar.q, \ + mapjoin_decimal.q, \ + mapjoin_distinct.q, \ + mapjoin_filter_on_outerjoin.q, \ + mapjoin_mapjoin.q, \ + mapjoin_memcheck.q, \ + mapjoin_subquery.q, \ + mapjoin_subquery2.q, \ + mapjoin_test_outer.q, \ + mapreduce1.q, \ + mapreduce2.q, \ + merge1.q, \ + merge2.q, \ + mergejoins.q, \ + mergejoins_mixed.q, \ + metadata_only_queries.q, \ + metadata_only_queries_with_filters.q, \ + multi_insert.q, \ + multi_insert_gby.q, \ + multi_insert_gby2.q, \ + multi_insert_gby3.q, \ + multi_insert_lateral_view.q, \ + multi_insert_mixed.q, \ + multi_insert_move_tasks_share_dependencies.q, \ + multi_join_union.q, \ + multi_join_union_src.q, \ + multigroupby_singlemr.q, \ + optimize_nullscan.q, \ + order.q, \ + order2.q, \ + outer_join_ppr.q, \ + parallel.q, \ + parallel_join0.q, \ + parallel_join1.q, \ + parquet_join.q, \ + pcr.q, \ + ppd_gby_join.q, \ + ppd_join.q, \ + ppd_join2.q, \ + ppd_join3.q, \ + ppd_join4.q, \ + ppd_join5.q, \ + ppd_join_filter.q, \ + ppd_multi_insert.q, \ + ppd_outer_join1.q, \ + ppd_outer_join2.q, \ + ppd_outer_join3.q, \ + ppd_outer_join4.q, \ + ppd_outer_join5.q, \ + ppd_transform.q, \ + ptf.q, \ + ptf_decimal.q, \ + ptf_general_queries.q, \ + ptf_matchpath.q, \ + ptf_rcfile.q, \ + ptf_register_tblfn.q, \ + ptf_seqfile.q, \ + ptf_streaming.q, \ + rcfile_bigdata.q, \ + reduce_deduplicate_exclude_join.q, \ + router_join_ppr.q, \ + runtime_skewjoin_mapjoin_spark.q, \ + sample1.q, \ + sample10.q, \ + sample2.q, \ + sample3.q, \ + sample4.q, \ + sample5.q, \ + sample6.q, \ + sample7.q, \ + sample8.q, \ + sample9.q, \ + script_env_var1.q, \ + script_env_var2.q, \ + script_pipe.q, \ + scriptfile1.q, \ + semijoin.q, \ + skewjoin.q, \ + skewjoin_noskew.q, \ + skewjoin_union_remove_1.q, \ + skewjoin_union_remove_2.q, \ + skewjoinopt1.q, \ + skewjoinopt10.q, \ + skewjoinopt11.q, \ + skewjoinopt12.q, \ + skewjoinopt13.q, \ + skewjoinopt14.q, \ + skewjoinopt15.q, \ + skewjoinopt16.q, \ + skewjoinopt17.q, \ + skewjoinopt18.q, \ + skewjoinopt19.q, \ + skewjoinopt2.q, \ + skewjoinopt20.q, \ + skewjoinopt3.q, \ + skewjoinopt4.q, \ + skewjoinopt5.q, \ + skewjoinopt6.q, \ + skewjoinopt7.q, \ + skewjoinopt8.q, \ + skewjoinopt9.q, \ + smb_mapjoin_1.q, \ + smb_mapjoin_10.q, \ + smb_mapjoin_11.q, \ + smb_mapjoin_12.q, \ + smb_mapjoin_13.q, \ + smb_mapjoin_14.q, \ + smb_mapjoin_15.q, \ + smb_mapjoin_16.q, \ + smb_mapjoin_17.q, \ + smb_mapjoin_18.q, \ + smb_mapjoin_19.q, \ + smb_mapjoin_2.q, \ + smb_mapjoin_20.q, \ + smb_mapjoin_21.q, \ + smb_mapjoin_22.q, \ + smb_mapjoin_25.q, \ + smb_mapjoin_3.q, \ + smb_mapjoin_4.q, \ + smb_mapjoin_5.q, \ + smb_mapjoin_6.q, \ + smb_mapjoin_7.q, \ + smb_mapjoin_8.q, \ + smb_mapjoin_9.q, \ + sort.q, \ + stats0.q, \ + stats1.q, \ + stats10.q, \ + stats12.q, \ + stats13.q, \ + stats14.q, \ + stats15.q, \ + stats16.q, \ + stats18.q, \ + stats2.q, \ + stats20.q, \ + stats3.q, \ + stats5.q, \ + stats6.q, \ + stats7.q, \ + stats8.q, \ + stats9.q, \ + stats_counter.q, \ + stats_counter_partitioned.q, \ + stats_noscan_1.q, \ + stats_noscan_2.q, \ + stats_only_null.q, \ + stats_partscan_1_23.q, \ + statsfs.q, \ + subquery_exists.q, \ + subquery_in.q, \ + subquery_multiinsert.q, \ + table_access_keys_stats.q, \ + temp_table.q, \ + temp_table_join1.q, \ + tez_join_tests.q, \ + tez_joins_explain.q, \ + timestamp_1.q, \ + timestamp_2.q, \ + timestamp_3.q, \ + timestamp_comparison.q, \ + timestamp_lazy.q, \ + timestamp_null.q, \ + timestamp_udf.q, \ + transform1.q, \ + transform2.q, \ + transform_ppr1.q, \ + transform_ppr2.q, \ + udf_example_add.q, \ + udf_in_file.q, \ + union.q, \ + union10.q, \ + union11.q, \ + union12.q, \ + union13.q, \ + union14.q, \ + union15.q, \ + union16.q, \ + union17.q, \ + union18.q, \ + union19.q, \ + union2.q, \ + union20.q, \ + union21.q, \ + union22.q, \ + union23.q, \ + union24.q, \ + union25.q, \ + union26.q, \ + union27.q, \ + union28.q, \ + union29.q, \ + union3.q, \ + union30.q, \ + union31.q, \ + union32.q, \ + union33.q, \ + union34.q, \ + union4.q, \ + union5.q, \ + union6.q, \ + union7.q, \ + union8.q, \ + union9.q, \ + union_date.q, \ + union_date_trim.q, \ + union_lateralview.q, \ + union_null.q, \ + union_ppr.q, \ + union_remove_1.q, \ + union_remove_10.q, \ + union_remove_11.q, \ + union_remove_12.q, \ + union_remove_13.q, \ + union_remove_14.q, \ + union_remove_15.q, \ + union_remove_16.q, \ + union_remove_17.q, \ + union_remove_18.q, \ + union_remove_19.q, \ + union_remove_2.q, \ + union_remove_20.q, \ + union_remove_21.q, \ + union_remove_22.q, \ + union_remove_23.q, \ + union_remove_24.q, \ + union_remove_25.q, \ + union_remove_3.q, \ + union_remove_4.q, \ + union_remove_5.q, \ + union_remove_6.q, \ + union_remove_6_subq.q, \ + union_remove_7.q, \ + union_remove_8.q, \ + union_remove_9.q, \ + union_script.q, \ + union_top_level.q, \ + uniquejoin.q, \ + union_view.q, \ + varchar_join1.q, \ + vector_between_in.q, \ + vector_cast_constant.q, \ + vector_char_4.q, \ + vector_count_distinct.q, \ + vector_data_types.q, \ + vector_decimal_aggregate.q, \ + vector_decimal_mapjoin.q, \ + vector_distinct_2.q, \ + vector_elt.q, \ + vector_groupby_3.q, \ + vector_left_outer_join.q, \ + vector_mapjoin_reduce.q, \ + vector_orderby_5.q, \ + vector_string_concat.q, \ + vector_varchar_4.q, \ + vectorization_0.q, \ + vectorization_1.q, \ + vectorization_10.q, \ + vectorization_11.q, \ + vectorization_12.q, \ + vectorization_13.q, \ + vectorization_14.q, \ + vectorization_15.q, \ + vectorization_16.q, \ + vectorization_17.q, \ + vectorization_2.q, \ + vectorization_3.q, \ + vectorization_4.q, \ + vectorization_5.q, \ + vectorization_6.q, \ + vectorization_9.q, \ + vectorization_decimal_date.q, \ + vectorization_div0.q, \ + vectorization_nested_udf.q, \ + vectorization_not.q, \ + vectorization_part.q, \ + vectorization_part_project.q, \ + vectorization_pushdown.q, \ + vectorization_short_regress.q, \ + vectorized_case.q, \ + vectorized_mapjoin.q, \ + vectorized_math_funcs.q, \ + vectorized_nested_mapjoin.q, \ + vectorized_ptf.q, \ + vectorized_rcfile_columnar.q, \ + vectorized_shufflejoin.q, \ + vectorized_string_funcs.q, \ + vectorized_timestamp_funcs.q, \ + windowing.q + +# Unlike "spark.query.files" above, these tests only run +# under Spark engine. +spark.only.query.files=spark_dynamic_partition_pruning.q,\ + spark_dynamic_partition_pruning_2.q,\ + spark_vectorized_dynamic_partition_pruning.q + +miniSparkOnYarn.query.files=auto_sortmerge_join_16.q,\ + bucket4.q,\ + bucket5.q,\ + bucket6.q,\ + bucketizedhiveinputformat.q,\ + bucketmapjoin6.q,\ + bucketmapjoin7.q,\ + constprog_partitioner.q,\ + disable_merge_for_bucketing.q,\ + empty_dir_in_table.q,\ + external_table_with_space_in_location_path.q,\ + file_with_header_footer.q,\ + import_exported_table.q,\ + index_bitmap3.q,\ + index_bitmap_auto.q,\ + infer_bucket_sort_bucketed_table.q,\ + infer_bucket_sort_map_operators.q,\ + infer_bucket_sort_merge.q,\ + infer_bucket_sort_num_buckets.q,\ + infer_bucket_sort_reducers_power_two.q,\ + input16_cc.q,\ + leftsemijoin_mr.q,\ + list_bucket_dml_10.q,\ + load_fs2.q,\ + load_hdfs_file_with_space_in_the_name.q,\ + optrstat_groupby.q,\ + orc_merge1.q,\ + orc_merge2.q,\ + orc_merge3.q,\ + orc_merge4.q,\ + orc_merge5.q,\ + orc_merge6.q,\ + orc_merge7.q,\ + orc_merge8.q,\ + orc_merge9.q,\ + orc_merge_diff_fs.q,\ + orc_merge_incompat1.q,\ + orc_merge_incompat2.q,\ + parallel_orderby.q,\ + ql_rewrite_gbtoidx.q,\ + ql_rewrite_gbtoidx_cbo_1.q,\ + quotedid_smb.q,\ + reduce_deduplicate.q,\ + remote_script.q,\ + root_dir_external_table.q,\ + schemeAuthority.q,\ + schemeAuthority2.q,\ + scriptfile1.q,\ + scriptfile1_win.q,\ + smb_mapjoin_8.q,\ + stats_counter.q,\ + stats_counter_partitioned.q,\ + temp_table_external.q,\ + truncate_column_buckets.q,\ + uber_reduce.q,\ + vector_inner_join.q,\ + vector_outer_join0.q,\ + vector_outer_join1.q,\ + vector_outer_join2.q,\ + vector_outer_join3.q,\ + vector_outer_join4.q,\ + vector_outer_join5.q + +spark.query.negative.files=groupby2_map_skew_multi_distinct.q http://git-wip-us.apache.org/repos/asf/hive/blob/26728a8a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java ---------------------------------------------------------------------- diff --git a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java b/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java index af43a07..3d7e4f0 100644 --- a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java +++ b/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java @@ -662,12 +662,21 @@ public class VectorizedBatchUtil { public static void debugDisplayOneRow(VectorizedRowBatch batch, int index, String prefix) { StringBuilder sb = new StringBuilder(); sb.append(prefix + " row " + index + " "); - for (int column = 0; column < batch.cols.length; column++) { + for (int p = 0; p < batch.projectionSize; p++) { + int column = batch.projectedColumns[p]; + if (p == column) { + sb.append("(col " + p + ") "); + } else { + sb.append("(proj col " + p + " col " + column + ") "); + } ColumnVector colVector = batch.cols[column]; if (colVector == null) { - sb.append("(null colVector " + column + ")"); + sb.append("(null ColumnVector)"); } else { boolean isRepeating = colVector.isRepeating; + if (isRepeating) { + sb.append("(repeating)"); + } index = (isRepeating ? 0 : index); if (colVector.noNulls || !colVector.isNull[index]) { if (colVector instanceof LongColumnVector) { http://git-wip-us.apache.org/repos/asf/hive/blob/26728a8a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java.orig ---------------------------------------------------------------------- diff --git a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java.orig b/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java.orig new file mode 100644 index 0000000..af43a07 --- /dev/null +++ b/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java.orig @@ -0,0 +1,707 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector; + +import java.io.IOException; +import java.sql.Timestamp; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.hive.common.ObjectPair; +import org.apache.hadoop.hive.common.type.HiveChar; +import org.apache.hadoop.hive.common.type.HiveIntervalDayTime; +import org.apache.hadoop.hive.common.type.HiveIntervalYearMonth; +import org.apache.hadoop.hive.common.type.HiveVarchar; +import org.apache.hadoop.hive.ql.exec.Utilities; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.serde2.io.ByteWritable; +import org.apache.hadoop.hive.serde2.io.DateWritable; +import org.apache.hadoop.hive.serde2.io.DoubleWritable; +import org.apache.hadoop.hive.serde2.io.HiveCharWritable; +import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable; +import org.apache.hadoop.hive.serde2.io.HiveIntervalDayTimeWritable; +import org.apache.hadoop.hive.serde2.io.HiveIntervalYearMonthWritable; +import org.apache.hadoop.hive.serde2.io.HiveVarcharWritable; +import org.apache.hadoop.hive.serde2.io.ShortWritable; +import org.apache.hadoop.hive.serde2.io.TimestampWritable; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; +import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructField; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; +import org.apache.hadoop.io.BooleanWritable; +import org.apache.hadoop.io.BytesWritable; +import org.apache.hadoop.io.DataOutputBuffer; +import org.apache.hadoop.io.FloatWritable; +import org.apache.hadoop.io.IntWritable; +import org.apache.hadoop.io.LongWritable; +import org.apache.hadoop.io.Text; +import org.apache.hive.common.util.DateUtils; + +public class VectorizedBatchUtil { + private static final Log LOG = LogFactory.getLog(VectorizedBatchUtil.class); + + /** + * Sets the IsNull value for ColumnVector at specified index + * @param cv + * @param rowIndex + */ + public static void setNullColIsNullValue(ColumnVector cv, int rowIndex) { + cv.isNull[rowIndex] = true; + if (cv.noNulls) { + cv.noNulls = false; + } + } + + /** + * Iterates thru all the column vectors and sets noNull to + * specified value. + * + * @param batch + * Batch on which noNull is set + */ + public static void setNoNullFields(VectorizedRowBatch batch) { + for (int i = 0; i < batch.numCols; i++) { + batch.cols[i].noNulls = true; + } + } + + /** + * Iterates thru all the column vectors and sets repeating to + * specified column. + * + */ + public static void setRepeatingColumn(VectorizedRowBatch batch, int column) { + ColumnVector cv = batch.cols[column]; + cv.isRepeating = true; + } + + /** + * Reduce the batch size for a vectorized row batch + */ + public static void setBatchSize(VectorizedRowBatch batch, int size) { + assert (size <= batch.getMaxSize()); + batch.size = size; + } + + /** + * Walk through the object inspector and add column vectors + * + * @param oi + * @param cvList + * ColumnVectors are populated in this list + */ + private static void allocateColumnVector(StructObjectInspector oi, + List cvList) throws HiveException { + if (cvList == null) { + throw new HiveException("Null columnvector list"); + } + if (oi == null) { + return; + } + final List fields = oi.getAllStructFieldRefs(); + for(StructField field : fields) { + ObjectInspector fieldObjectInspector = field.getFieldObjectInspector(); + switch(fieldObjectInspector.getCategory()) { + case PRIMITIVE: + PrimitiveObjectInspector poi = (PrimitiveObjectInspector) fieldObjectInspector; + switch(poi.getPrimitiveCategory()) { + case BOOLEAN: + case BYTE: + case SHORT: + case INT: + case LONG: + case TIMESTAMP: + case DATE: + case INTERVAL_YEAR_MONTH: + case INTERVAL_DAY_TIME: + cvList.add(new LongColumnVector(VectorizedRowBatch.DEFAULT_SIZE)); + break; + case FLOAT: + case DOUBLE: + cvList.add(new DoubleColumnVector(VectorizedRowBatch.DEFAULT_SIZE)); + break; + case BINARY: + case STRING: + case CHAR: + case VARCHAR: + cvList.add(new BytesColumnVector(VectorizedRowBatch.DEFAULT_SIZE)); + break; + case DECIMAL: + DecimalTypeInfo tInfo = (DecimalTypeInfo) poi.getTypeInfo(); + cvList.add(new DecimalColumnVector(VectorizedRowBatch.DEFAULT_SIZE, + tInfo.precision(), tInfo.scale())); + break; + default: + throw new HiveException("Vectorizaton is not supported for datatype:" + + poi.getPrimitiveCategory()); + } + break; + case STRUCT: + throw new HiveException("Struct not supported"); + default: + throw new HiveException("Flattening is not supported for datatype:" + + fieldObjectInspector.getCategory()); + } + } + } + + + /** + * Create VectorizedRowBatch from ObjectInspector + * + * @param oi + * @return + * @throws HiveException + */ + public static VectorizedRowBatch constructVectorizedRowBatch( + StructObjectInspector oi) throws HiveException { + final List cvList = new LinkedList(); + allocateColumnVector(oi, cvList); + final VectorizedRowBatch result = new VectorizedRowBatch(cvList.size()); + int i = 0; + for(ColumnVector cv : cvList) { + result.cols[i++] = cv; + } + return result; + } + + /** + * Create VectorizedRowBatch from key and value object inspectors + * The row object inspector used by ReduceWork needs to be a **standard** + * struct object inspector, not just any struct object inspector. + * @param keyInspector + * @param valueInspector + * @param vectorScratchColumnTypeMap + * @return VectorizedRowBatch, OI + * @throws HiveException + */ + public static ObjectPair constructVectorizedRowBatch( + StructObjectInspector keyInspector, StructObjectInspector valueInspector, Map vectorScratchColumnTypeMap) + throws HiveException { + + ArrayList colNames = new ArrayList(); + ArrayList ois = new ArrayList(); + List fields = keyInspector.getAllStructFieldRefs(); + for (StructField field: fields) { + colNames.add(Utilities.ReduceField.KEY.toString() + "." + field.getFieldName()); + ois.add(field.getFieldObjectInspector()); + } + fields = valueInspector.getAllStructFieldRefs(); + for (StructField field: fields) { + colNames.add(Utilities.ReduceField.VALUE.toString() + "." + field.getFieldName()); + ois.add(field.getFieldObjectInspector()); + } + StandardStructObjectInspector rowObjectInspector = ObjectInspectorFactory.getStandardStructObjectInspector(colNames, ois); + + VectorizedRowBatchCtx batchContext = new VectorizedRowBatchCtx(); + batchContext.init(vectorScratchColumnTypeMap, rowObjectInspector); + return new ObjectPair<>(batchContext.createVectorizedRowBatch(), rowObjectInspector); + } + + /** + * Iterates through all columns in a given row and populates the batch + * + * @param row + * @param oi + * @param rowIndex + * @param batch + * @param buffer + * @throws HiveException + */ + public static void addRowToBatch(Object row, StructObjectInspector oi, + int rowIndex, + VectorizedRowBatch batch, + DataOutputBuffer buffer + ) throws HiveException { + addRowToBatchFrom(row, oi, rowIndex, 0, batch, buffer); + } + + /** + * Iterates thru all the columns in a given row and populates the batch + * from a given offset + * + * @param row Deserialized row object + * @param oi Object insepector for that row + * @param rowIndex index to which the row should be added to batch + * @param colOffset offset from where the column begins + * @param batch Vectorized batch to which the row is added at rowIndex + * @throws HiveException + */ + public static void addRowToBatchFrom(Object row, StructObjectInspector oi, + int rowIndex, + int colOffset, + VectorizedRowBatch batch, + DataOutputBuffer buffer + ) throws HiveException { + List fieldRefs = oi.getAllStructFieldRefs(); + final int off = colOffset; + // Iterate thru the cols and load the batch + for (int i = 0; i < fieldRefs.size(); i++) { + setVector(row, oi, fieldRefs.get(i), batch, buffer, rowIndex, i, off); + } + } + + /** + * Add only the projected column of a regular row to the specified vectorized row batch + * @param row the regular row + * @param oi object inspector for the row + * @param rowIndex the offset to add in the batch + * @param batch vectorized row batch + * @param buffer data output buffer + * @throws HiveException + */ + public static void addProjectedRowToBatchFrom(Object row, StructObjectInspector oi, + int rowIndex, VectorizedRowBatch batch, DataOutputBuffer buffer) throws HiveException { + List fieldRefs = oi.getAllStructFieldRefs(); + for (int i = 0; i < fieldRefs.size(); i++) { + int projectedOutputCol = batch.projectedColumns[i]; + if (batch.cols[projectedOutputCol] == null) { + continue; + } + setVector(row, oi, fieldRefs.get(i), batch, buffer, rowIndex, projectedOutputCol, 0); + } + } + /** + * Iterates thru all the columns in a given row and populates the batch + * from a given offset + * + * @param row Deserialized row object + * @param oi Object insepector for that row + * @param rowIndex index to which the row should be added to batch + * @param batch Vectorized batch to which the row is added at rowIndex + * @param context context object for this vectorized batch + * @param buffer + * @throws HiveException + */ + public static void acidAddRowToBatch(Object row, + StructObjectInspector oi, + int rowIndex, + VectorizedRowBatch batch, + VectorizedRowBatchCtx context, + DataOutputBuffer buffer) throws HiveException { + List fieldRefs = oi.getAllStructFieldRefs(); + // Iterate thru the cols and load the batch + for (int i = 0; i < fieldRefs.size(); i++) { + if (batch.cols[i] == null) { + // This means the column was not included in the projection from the underlying read + continue; + } + if (context.isPartitionCol(i)) { + // The value will have already been set before we're called, so don't overwrite it + continue; + } + setVector(row, oi, fieldRefs.get(i), batch, buffer, rowIndex, i, 0); + } + } + + private static void setVector(Object row, + StructObjectInspector oi, + StructField field, + VectorizedRowBatch batch, + DataOutputBuffer buffer, + int rowIndex, + int colIndex, + int offset) throws HiveException { + + Object fieldData = oi.getStructFieldData(row, field); + ObjectInspector foi = field.getFieldObjectInspector(); + + // Vectorization only supports PRIMITIVE data types. Assert the same + assert (foi.getCategory() == Category.PRIMITIVE); + + // Get writable object + PrimitiveObjectInspector poi = (PrimitiveObjectInspector) foi; + Object writableCol = poi.getPrimitiveWritableObject(fieldData); + + // NOTE: The default value for null fields in vectorization is 1 for int types, NaN for + // float/double. String types have no default value for null. + switch (poi.getPrimitiveCategory()) { + case BOOLEAN: { + LongColumnVector lcv = (LongColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + lcv.vector[rowIndex] = ((BooleanWritable) writableCol).get() ? 1 : 0; + lcv.isNull[rowIndex] = false; + } else { + lcv.vector[rowIndex] = 1; + setNullColIsNullValue(lcv, rowIndex); + } + } + break; + case BYTE: { + LongColumnVector lcv = (LongColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + lcv.vector[rowIndex] = ((ByteWritable) writableCol).get(); + lcv.isNull[rowIndex] = false; + } else { + lcv.vector[rowIndex] = 1; + setNullColIsNullValue(lcv, rowIndex); + } + } + break; + case SHORT: { + LongColumnVector lcv = (LongColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + lcv.vector[rowIndex] = ((ShortWritable) writableCol).get(); + lcv.isNull[rowIndex] = false; + } else { + lcv.vector[rowIndex] = 1; + setNullColIsNullValue(lcv, rowIndex); + } + } + break; + case INT: { + LongColumnVector lcv = (LongColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + lcv.vector[rowIndex] = ((IntWritable) writableCol).get(); + lcv.isNull[rowIndex] = false; + } else { + lcv.vector[rowIndex] = 1; + setNullColIsNullValue(lcv, rowIndex); + } + } + break; + case LONG: { + LongColumnVector lcv = (LongColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + lcv.vector[rowIndex] = ((LongWritable) writableCol).get(); + lcv.isNull[rowIndex] = false; + } else { + lcv.vector[rowIndex] = 1; + setNullColIsNullValue(lcv, rowIndex); + } + } + break; + case DATE: { + LongColumnVector lcv = (LongColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + lcv.vector[rowIndex] = ((DateWritable) writableCol).getDays(); + lcv.isNull[rowIndex] = false; + } else { + lcv.vector[rowIndex] = 1; + setNullColIsNullValue(lcv, rowIndex); + } + } + break; + case FLOAT: { + DoubleColumnVector dcv = (DoubleColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + dcv.vector[rowIndex] = ((FloatWritable) writableCol).get(); + dcv.isNull[rowIndex] = false; + } else { + dcv.vector[rowIndex] = Double.NaN; + setNullColIsNullValue(dcv, rowIndex); + } + } + break; + case DOUBLE: { + DoubleColumnVector dcv = (DoubleColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + dcv.vector[rowIndex] = ((DoubleWritable) writableCol).get(); + dcv.isNull[rowIndex] = false; + } else { + dcv.vector[rowIndex] = Double.NaN; + setNullColIsNullValue(dcv, rowIndex); + } + } + break; + case TIMESTAMP: { + LongColumnVector lcv = (LongColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + Timestamp t = ((TimestampWritable) writableCol).getTimestamp(); + lcv.vector[rowIndex] = TimestampUtils.getTimeNanoSec(t); + lcv.isNull[rowIndex] = false; + } else { + lcv.vector[rowIndex] = 1; + setNullColIsNullValue(lcv, rowIndex); + } + } + break; + case INTERVAL_YEAR_MONTH: { + LongColumnVector lcv = (LongColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + HiveIntervalYearMonth i = ((HiveIntervalYearMonthWritable) writableCol).getHiveIntervalYearMonth(); + lcv.vector[rowIndex] = i.getTotalMonths(); + lcv.isNull[rowIndex] = false; + } else { + lcv.vector[rowIndex] = 1; + setNullColIsNullValue(lcv, rowIndex); + } + } + break; + case INTERVAL_DAY_TIME: { + LongColumnVector lcv = (LongColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + HiveIntervalDayTime i = ((HiveIntervalDayTimeWritable) writableCol).getHiveIntervalDayTime(); + lcv.vector[rowIndex] = DateUtils.getIntervalDayTimeTotalNanos(i); + lcv.isNull[rowIndex] = false; + } else { + lcv.vector[rowIndex] = 1; + setNullColIsNullValue(lcv, rowIndex); + } + } + break; + case BINARY: { + BytesColumnVector bcv = (BytesColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + bcv.isNull[rowIndex] = false; + BytesWritable bw = (BytesWritable) writableCol; + byte[] bytes = bw.getBytes(); + int start = buffer.getLength(); + int length = bw.getLength(); + try { + buffer.write(bytes, 0, length); + } catch (IOException ioe) { + throw new IllegalStateException("bad write", ioe); + } + bcv.setRef(rowIndex, buffer.getData(), start, length); + } else { + setNullColIsNullValue(bcv, rowIndex); + } + } + break; + case STRING: { + BytesColumnVector bcv = (BytesColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + bcv.isNull[rowIndex] = false; + Text colText = (Text) writableCol; + int start = buffer.getLength(); + int length = colText.getLength(); + try { + buffer.write(colText.getBytes(), 0, length); + } catch (IOException ioe) { + throw new IllegalStateException("bad write", ioe); + } + bcv.setRef(rowIndex, buffer.getData(), start, length); + } else { + setNullColIsNullValue(bcv, rowIndex); + } + } + break; + case CHAR: { + BytesColumnVector bcv = (BytesColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + bcv.isNull[rowIndex] = false; + HiveChar colHiveChar = ((HiveCharWritable) writableCol).getHiveChar(); + byte[] bytes = colHiveChar.getStrippedValue().getBytes(); + + // We assume the CHAR maximum length was enforced when the object was created. + int length = bytes.length; + + int start = buffer.getLength(); + try { + // In vector mode, we store CHAR as unpadded. + buffer.write(bytes, 0, length); + } catch (IOException ioe) { + throw new IllegalStateException("bad write", ioe); + } + bcv.setRef(rowIndex, buffer.getData(), start, length); + } else { + setNullColIsNullValue(bcv, rowIndex); + } + } + break; + case VARCHAR: { + BytesColumnVector bcv = (BytesColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + bcv.isNull[rowIndex] = false; + HiveVarchar colHiveVarchar = ((HiveVarcharWritable) writableCol).getHiveVarchar(); + byte[] bytes = colHiveVarchar.getValue().getBytes(); + + // We assume the VARCHAR maximum length was enforced when the object was created. + int length = bytes.length; + + int start = buffer.getLength(); + try { + buffer.write(bytes, 0, length); + } catch (IOException ioe) { + throw new IllegalStateException("bad write", ioe); + } + bcv.setRef(rowIndex, buffer.getData(), start, length); + } else { + setNullColIsNullValue(bcv, rowIndex); + } + } + break; + case DECIMAL: + DecimalColumnVector dcv = (DecimalColumnVector) batch.cols[offset + colIndex]; + if (writableCol != null) { + dcv.isNull[rowIndex] = false; + HiveDecimalWritable wobj = (HiveDecimalWritable) writableCol; + dcv.set(rowIndex, wobj); + } else { + setNullColIsNullValue(dcv, rowIndex); + } + break; + default: + throw new HiveException("Vectorizaton is not supported for datatype:" + + poi.getPrimitiveCategory()); + } + } + + public static StandardStructObjectInspector convertToStandardStructObjectInspector( + StructObjectInspector structObjectInspector) throws HiveException { + + List fields = structObjectInspector.getAllStructFieldRefs(); + List oids = new ArrayList(); + ArrayList columnNames = new ArrayList(); + + for(StructField field : fields) { + TypeInfo typeInfo = TypeInfoUtils.getTypeInfoFromTypeString( + field.getFieldObjectInspector().getTypeName()); + ObjectInspector standardWritableObjectInspector = + TypeInfoUtils.getStandardWritableObjectInspectorFromTypeInfo(typeInfo); + oids.add(standardWritableObjectInspector); + columnNames.add(field.getFieldName()); + } + return ObjectInspectorFactory.getStandardStructObjectInspector(columnNames,oids); + } + + public static PrimitiveTypeInfo[] primitiveTypeInfosFromStructObjectInspector( + StructObjectInspector structObjectInspector) throws HiveException { + + List fields = structObjectInspector.getAllStructFieldRefs(); + PrimitiveTypeInfo[] result = new PrimitiveTypeInfo[fields.size()]; + + int i = 0; + for(StructField field : fields) { + TypeInfo typeInfo = TypeInfoUtils.getTypeInfoFromTypeString( + field.getFieldObjectInspector().getTypeName()); + result[i++] = (PrimitiveTypeInfo) typeInfo; + } + return result; + } + + public static PrimitiveTypeInfo[] primitiveTypeInfosFromTypeNames( + String[] typeNames) throws HiveException { + + PrimitiveTypeInfo[] result = new PrimitiveTypeInfo[typeNames.length]; + + for(int i = 0; i < typeNames.length; i++) { + TypeInfo typeInfo = TypeInfoUtils.getTypeInfoFromTypeString(typeNames[i]); + result[i] = (PrimitiveTypeInfo) typeInfo; + } + return result; + } + + /** + * Make a new (scratch) batch, which is exactly "like" the batch provided, except that it's empty + * @param batch the batch to imitate + * @return the new batch + * @throws HiveException + */ + public static VectorizedRowBatch makeLike(VectorizedRowBatch batch) throws HiveException { + VectorizedRowBatch newBatch = new VectorizedRowBatch(batch.numCols); + for (int i = 0; i < batch.numCols; i++) { + ColumnVector colVector = batch.cols[i]; + if (colVector != null) { + ColumnVector newColVector; + if (colVector instanceof LongColumnVector) { + newColVector = new LongColumnVector(); + } else if (colVector instanceof DoubleColumnVector) { + newColVector = new DoubleColumnVector(); + } else if (colVector instanceof BytesColumnVector) { + newColVector = new BytesColumnVector(); + } else if (colVector instanceof DecimalColumnVector) { + DecimalColumnVector decColVector = (DecimalColumnVector) colVector; + newColVector = new DecimalColumnVector(decColVector.precision, decColVector.scale); + } else { + throw new HiveException("Column vector class " + colVector.getClass().getName() + + " is not supported!"); + } + newBatch.cols[i] = newColVector; + newBatch.cols[i].init(); + } + } + newBatch.projectedColumns = Arrays.copyOf(batch.projectedColumns, batch.projectedColumns.length); + newBatch.projectionSize = batch.projectionSize; + newBatch.reset(); + return newBatch; + } + + public static String displayBytes(byte[] bytes, int start, int length) { + StringBuilder sb = new StringBuilder(); + for (int i = start; i < start + length; i++) { + char ch = (char) bytes[i]; + if (ch < ' ' || ch > '~') { + sb.append(String.format("\\%03d", bytes[i] & 0xff)); + } else { + sb.append(ch); + } + } + return sb.toString(); + } + + public static void debugDisplayOneRow(VectorizedRowBatch batch, int index, String prefix) { + StringBuilder sb = new StringBuilder(); + sb.append(prefix + " row " + index + " "); + for (int column = 0; column < batch.cols.length; column++) { + ColumnVector colVector = batch.cols[column]; + if (colVector == null) { + sb.append("(null colVector " + column + ")"); + } else { + boolean isRepeating = colVector.isRepeating; + index = (isRepeating ? 0 : index); + if (colVector.noNulls || !colVector.isNull[index]) { + if (colVector instanceof LongColumnVector) { + sb.append(((LongColumnVector) colVector).vector[index]); + } else if (colVector instanceof DoubleColumnVector) { + sb.append(((DoubleColumnVector) colVector).vector[index]); + } else if (colVector instanceof BytesColumnVector) { + BytesColumnVector bytesColumnVector = (BytesColumnVector) colVector; + byte[] bytes = bytesColumnVector.vector[index]; + int start = bytesColumnVector.start[index]; + int length = bytesColumnVector.length[index]; + if (bytes == null) { + sb.append("(Unexpected null bytes with start " + start + " length " + length + ")"); + } else { + sb.append("bytes: '" + displayBytes(bytes, start, length) + "'"); + } + } else if (colVector instanceof DecimalColumnVector) { + sb.append(((DecimalColumnVector) colVector).vector[index].toString()); + } else { + sb.append("Unknown"); + } + } else { + sb.append("NULL"); + } + } + sb.append(" "); + } + LOG.info(sb.toString()); + } + + public static void debugDisplayBatch(VectorizedRowBatch batch, String prefix) { + for (int i = 0; i < batch.size; i++) { + int index = (batch.selectedInUse ? batch.selected[i] : i); + debugDisplayOneRow(batch, index, prefix); + } + } +} http://git-wip-us.apache.org/repos/asf/hive/blob/26728a8a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/udf/VectorUDFArgDesc.java ---------------------------------------------------------------------- diff --git a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/udf/VectorUDFArgDesc.java b/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/udf/VectorUDFArgDesc.java index e113980..749ddea 100644 --- a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/udf/VectorUDFArgDesc.java +++ b/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/udf/VectorUDFArgDesc.java @@ -27,6 +27,7 @@ import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc; import org.apache.hadoop.hive.ql.udf.generic.GenericUDF; import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category; import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory; import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory; import org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo; @@ -52,6 +53,17 @@ public class VectorUDFArgDesc implements Serializable { */ public void setConstant(ExprNodeConstantDesc expr) { isConstant = true; + if (expr != null) { + if (expr.getTypeInfo().getCategory() == Category.PRIMITIVE) { + PrimitiveCategory primitiveCategory = ((PrimitiveTypeInfo) expr.getTypeInfo()) + .getPrimitiveCategory(); + if (primitiveCategory == PrimitiveCategory.VOID) { + // Otherwise we'd create a NullWritable and that isn't what we want. + expr = null; + } + } + } + constExpr = expr; } http://git-wip-us.apache.org/repos/asf/hive/blob/26728a8a/ql/src/test/queries/clientpositive/vector_when_case_null.q ---------------------------------------------------------------------- diff --git a/ql/src/test/queries/clientpositive/vector_when_case_null.q b/ql/src/test/queries/clientpositive/vector_when_case_null.q new file mode 100644 index 0000000..a423b60 --- /dev/null +++ b/ql/src/test/queries/clientpositive/vector_when_case_null.q @@ -0,0 +1,14 @@ +set hive.explain.user=false; +SET hive.vectorized.execution.enabled=true; +SET hive.auto.convert.join=true; +set hive.fetch.task.conversion=none; + +-- SORT_QUERY_RESULTS + +create table count_case_groupby (key string, bool boolean) STORED AS orc; +insert into table count_case_groupby values ('key1', true),('key2', false),('key3', NULL),('key4', false),('key5',NULL); + +explain +SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key; + +SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key; \ No newline at end of file http://git-wip-us.apache.org/repos/asf/hive/blob/26728a8a/ql/src/test/results/clientpositive/tez/vector_select_null2.q.out ---------------------------------------------------------------------- diff --git a/ql/src/test/results/clientpositive/tez/vector_select_null2.q.out b/ql/src/test/results/clientpositive/tez/vector_select_null2.q.out new file mode 100644 index 0000000..f9dad4e --- /dev/null +++ b/ql/src/test/results/clientpositive/tez/vector_select_null2.q.out @@ -0,0 +1,95 @@ +PREHOOK: query: -- SORT_QUERY_RESULTS + +create table count_case_groupby (key string, bool boolean) STORED AS orc +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@count_case_groupby +POSTHOOK: query: -- SORT_QUERY_RESULTS + +create table count_case_groupby (key string, bool boolean) STORED AS orc +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@count_case_groupby +PREHOOK: query: insert into table count_case_groupby values ('key1', true),('key2', false),('key3', NULL),('key4', false),('key5',NULL) +PREHOOK: type: QUERY +PREHOOK: Input: default@values__tmp__table__1 +PREHOOK: Output: default@count_case_groupby +POSTHOOK: query: insert into table count_case_groupby values ('key1', true),('key2', false),('key3', NULL),('key4', false),('key5',NULL) +POSTHOOK: type: QUERY +POSTHOOK: Input: default@values__tmp__table__1 +POSTHOOK: Output: default@count_case_groupby +POSTHOOK: Lineage: count_case_groupby.bool EXPRESSION [(values__tmp__table__1)values__tmp__table__1.FieldSchema(name:tmp_values_col2, type:string, comment:), ] +POSTHOOK: Lineage: count_case_groupby.key SIMPLE [(values__tmp__table__1)values__tmp__table__1.FieldSchema(name:tmp_values_col1, type:string, comment:), ] +PREHOOK: query: explain +SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key +PREHOOK: type: QUERY +POSTHOOK: query: explain +SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 + Tez + Edges: + Reducer 2 <- Map 1 (SIMPLE_EDGE) +#### A masked pattern was here #### + Vertices: + Map 1 + Map Operator Tree: + TableScan + alias: count_case_groupby + Statistics: Num rows: 5 Data size: 452 Basic stats: COMPLETE Column stats: NONE + Select Operator + expressions: key (type: string), CASE WHEN (bool) THEN (1) WHEN ((not bool)) THEN (0) ELSE (null) END (type: int) + outputColumnNames: _col0, _col1 + Statistics: Num rows: 5 Data size: 452 Basic stats: COMPLETE Column stats: NONE + Group By Operator + aggregations: count(_col1) + keys: _col0 (type: string) + mode: hash + outputColumnNames: _col0, _col1 + Statistics: Num rows: 5 Data size: 452 Basic stats: COMPLETE Column stats: NONE + Reduce Output Operator + key expressions: _col0 (type: string) + sort order: + + Map-reduce partition columns: _col0 (type: string) + Statistics: Num rows: 5 Data size: 452 Basic stats: COMPLETE Column stats: NONE + value expressions: _col1 (type: bigint) + Reducer 2 + Reduce Operator Tree: + Group By Operator + aggregations: count(VALUE._col0) + keys: KEY._col0 (type: string) + mode: mergepartial + outputColumnNames: _col0, _col1 + Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column stats: NONE + File Output Operator + compressed: false + Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column stats: NONE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 + Fetch Operator + limit: -1 + Processor Tree: + ListSink + +PREHOOK: query: SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key +PREHOOK: type: QUERY +PREHOOK: Input: default@count_case_groupby +#### A masked pattern was here #### +POSTHOOK: query: SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key +POSTHOOK: type: QUERY +POSTHOOK: Input: default@count_case_groupby +#### A masked pattern was here #### +key1 1 +key2 1 +key3 0 +key4 1 +key5 0 http://git-wip-us.apache.org/repos/asf/hive/blob/26728a8a/ql/src/test/results/clientpositive/tez/vector_when_case_null.q.out ---------------------------------------------------------------------- diff --git a/ql/src/test/results/clientpositive/tez/vector_when_case_null.q.out b/ql/src/test/results/clientpositive/tez/vector_when_case_null.q.out new file mode 100644 index 0000000..07a9659 --- /dev/null +++ b/ql/src/test/results/clientpositive/tez/vector_when_case_null.q.out @@ -0,0 +1,96 @@ +PREHOOK: query: -- SORT_QUERY_RESULTS + +create table count_case_groupby (key string, bool boolean) STORED AS orc +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@count_case_groupby +POSTHOOK: query: -- SORT_QUERY_RESULTS + +create table count_case_groupby (key string, bool boolean) STORED AS orc +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@count_case_groupby +PREHOOK: query: insert into table count_case_groupby values ('key1', true),('key2', false),('key3', NULL),('key4', false),('key5',NULL) +PREHOOK: type: QUERY +PREHOOK: Input: default@values__tmp__table__1 +PREHOOK: Output: default@count_case_groupby +POSTHOOK: query: insert into table count_case_groupby values ('key1', true),('key2', false),('key3', NULL),('key4', false),('key5',NULL) +POSTHOOK: type: QUERY +POSTHOOK: Input: default@values__tmp__table__1 +POSTHOOK: Output: default@count_case_groupby +POSTHOOK: Lineage: count_case_groupby.bool EXPRESSION [(values__tmp__table__1)values__tmp__table__1.FieldSchema(name:tmp_values_col2, type:string, comment:), ] +POSTHOOK: Lineage: count_case_groupby.key SIMPLE [(values__tmp__table__1)values__tmp__table__1.FieldSchema(name:tmp_values_col1, type:string, comment:), ] +PREHOOK: query: explain +SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key +PREHOOK: type: QUERY +POSTHOOK: query: explain +SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 + Tez + Edges: + Reducer 2 <- Map 1 (SIMPLE_EDGE) +#### A masked pattern was here #### + Vertices: + Map 1 + Map Operator Tree: + TableScan + alias: count_case_groupby + Statistics: Num rows: 5 Data size: 452 Basic stats: COMPLETE Column stats: NONE + Select Operator + expressions: key (type: string), CASE WHEN (bool) THEN (1) WHEN ((not bool)) THEN (0) ELSE (null) END (type: int) + outputColumnNames: _col0, _col1 + Statistics: Num rows: 5 Data size: 452 Basic stats: COMPLETE Column stats: NONE + Group By Operator + aggregations: count(_col1) + keys: _col0 (type: string) + mode: hash + outputColumnNames: _col0, _col1 + Statistics: Num rows: 5 Data size: 452 Basic stats: COMPLETE Column stats: NONE + Reduce Output Operator + key expressions: _col0 (type: string) + sort order: + + Map-reduce partition columns: _col0 (type: string) + Statistics: Num rows: 5 Data size: 452 Basic stats: COMPLETE Column stats: NONE + value expressions: _col1 (type: bigint) + Reducer 2 + Reduce Operator Tree: + Group By Operator + aggregations: count(VALUE._col0) + keys: KEY._col0 (type: string) + mode: mergepartial + outputColumnNames: _col0, _col1 + Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column stats: NONE + File Output Operator + compressed: false + Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column stats: NONE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + Execution mode: vectorized + + Stage: Stage-0 + Fetch Operator + limit: -1 + Processor Tree: + ListSink + +PREHOOK: query: SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key +PREHOOK: type: QUERY +PREHOOK: Input: default@count_case_groupby +#### A masked pattern was here #### +POSTHOOK: query: SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key +POSTHOOK: type: QUERY +POSTHOOK: Input: default@count_case_groupby +#### A masked pattern was here #### +key1 1 +key2 1 +key3 0 +key4 1 +key5 0 http://git-wip-us.apache.org/repos/asf/hive/blob/26728a8a/ql/src/test/results/clientpositive/vector_when_case_null.q.out ---------------------------------------------------------------------- diff --git a/ql/src/test/results/clientpositive/vector_when_case_null.q.out b/ql/src/test/results/clientpositive/vector_when_case_null.q.out new file mode 100644 index 0000000..16b2d33 --- /dev/null +++ b/ql/src/test/results/clientpositive/vector_when_case_null.q.out @@ -0,0 +1,89 @@ +PREHOOK: query: -- SORT_QUERY_RESULTS + +create table count_case_groupby (key string, bool boolean) STORED AS orc +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@count_case_groupby +POSTHOOK: query: -- SORT_QUERY_RESULTS + +create table count_case_groupby (key string, bool boolean) STORED AS orc +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@count_case_groupby +PREHOOK: query: insert into table count_case_groupby values ('key1', true),('key2', false),('key3', NULL),('key4', false),('key5',NULL) +PREHOOK: type: QUERY +PREHOOK: Input: default@values__tmp__table__1 +PREHOOK: Output: default@count_case_groupby +POSTHOOK: query: insert into table count_case_groupby values ('key1', true),('key2', false),('key3', NULL),('key4', false),('key5',NULL) +POSTHOOK: type: QUERY +POSTHOOK: Input: default@values__tmp__table__1 +POSTHOOK: Output: default@count_case_groupby +POSTHOOK: Lineage: count_case_groupby.bool EXPRESSION [(values__tmp__table__1)values__tmp__table__1.FieldSchema(name:tmp_values_col2, type:string, comment:), ] +POSTHOOK: Lineage: count_case_groupby.key SIMPLE [(values__tmp__table__1)values__tmp__table__1.FieldSchema(name:tmp_values_col1, type:string, comment:), ] +PREHOOK: query: explain +SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key +PREHOOK: type: QUERY +POSTHOOK: query: explain +SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 + Map Reduce + Map Operator Tree: + TableScan + alias: count_case_groupby + Statistics: Num rows: 5 Data size: 452 Basic stats: COMPLETE Column stats: NONE + Select Operator + expressions: key (type: string), CASE WHEN (bool) THEN (1) WHEN ((not bool)) THEN (0) ELSE (null) END (type: int) + outputColumnNames: _col0, _col1 + Statistics: Num rows: 5 Data size: 452 Basic stats: COMPLETE Column stats: NONE + Group By Operator + aggregations: count(_col1) + keys: _col0 (type: string) + mode: hash + outputColumnNames: _col0, _col1 + Statistics: Num rows: 5 Data size: 452 Basic stats: COMPLETE Column stats: NONE + Reduce Output Operator + key expressions: _col0 (type: string) + sort order: + + Map-reduce partition columns: _col0 (type: string) + Statistics: Num rows: 5 Data size: 452 Basic stats: COMPLETE Column stats: NONE + value expressions: _col1 (type: bigint) + Reduce Operator Tree: + Group By Operator + aggregations: count(VALUE._col0) + keys: KEY._col0 (type: string) + mode: mergepartial + outputColumnNames: _col0, _col1 + Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column stats: NONE + File Output Operator + compressed: false + Statistics: Num rows: 2 Data size: 180 Basic stats: COMPLETE Column stats: NONE + table: + input format: org.apache.hadoop.mapred.TextInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + + Stage: Stage-0 + Fetch Operator + limit: -1 + Processor Tree: + ListSink + +PREHOOK: query: SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key +PREHOOK: type: QUERY +PREHOOK: Input: default@count_case_groupby +#### A masked pattern was here #### +POSTHOOK: query: SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) AS cnt_bool0_ok FROM count_case_groupby GROUP BY key +POSTHOOK: type: QUERY +POSTHOOK: Input: default@count_case_groupby +#### A masked pattern was here #### +key1 1 +key2 1 +key3 0 +key4 1 +key5 0