hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler
Date Tue, 06 Nov 2018 17:11:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677025#comment-16677025
] 

Hive QA commented on HIVE-20512:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12947008/HIVE-20512.7.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 15060 tests executed
*Failed tests:*
{noformat}
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=189)
	[infer_bucket_sort_reducers_power_two.q,list_bucket_dml_10.q,orc_merge9.q,leftsemijoin_mr.q,bucket6.q,bucketmapjoin7.q,uber_reduce.q,empty_dir_in_table.q,vector_outer_join2.q,spark_explain_groupbyshuffle.q,spark_dynamic_partition_pruning.q,spark_combine_equivalent_work.q,orc_merge1.q,spark_use_op_stats.q,orc_merge_diff_fs.q,quotedid_smb.q,truncate_column_buckets.q,spark_vectorized_dynamic_partition_pruning.q,spark_in_process_launcher.q,orc_merge3.q]
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=190)
	[infer_bucket_sort_num_buckets.q,gen_udf_example_add10.q,spark_explainuser_1.q,spark_use_ts_stats_for_mapjoin.q,orc_merge6.q,orc_merge5.q,bucketmapjoin6.q,spark_opt_shuffle_serde.q,temp_table_external.q,spark_dynamic_partition_pruning_6.q,dynamic_rdd_cache.q,auto_sortmerge_join_16.q,vector_outer_join3.q,spark_dynamic_partition_pruning_7.q,schemeAuthority.q,parallel_orderby.q,vector_outer_join1.q,load_hdfs_file_with_space_in_the_name.q,spark_dynamic_partition_pruning_recursive_mapjoin.q,spark_dynamic_partition_pruning_mapjoin_only.q]
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=191)
	[insert_overwrite_directory2.q,spark_dynamic_partition_pruning_4.q,import_exported_table.q,vector_outer_join0.q,bucket4.q,orc_merge4.q,infer_bucket_sort_merge.q,orc_merge_incompat1.q,root_dir_external_table.q,constprog_partitioner.q,constprog_semijoin.q,external_table_with_space_in_location_path.q,spark_constprog_dpp.q,spark_dynamic_partition_pruning_3.q,load_fs2.q,infer_bucket_sort_map_operators.q,spark_dynamic_partition_pruning_2.q,vector_inner_join.q,spark_multi_insert_parallel_orderby.q,remote_script.q]
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=192)
	[scriptfile1.q,vector_outer_join5.q,file_with_header_footer.q,input16_cc.q,bucket5.q,orc_merge2.q,reduce_deduplicate.q,schemeAuthority2.q,spark_dynamic_partition_pruning_5.q,orc_merge8.q,orc_merge_incompat2.q,infer_bucket_sort_bucketed_table.q,vector_outer_join4.q,disable_merge_for_bucketing.q,orc_merge7.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=110)
	[bucketmapjoin4.q,bucket_map_join_spark4.q,union21.q,groupby2_noskew.q,timestamp_2.q,date_join1.q,mergejoins.q,smb_mapjoin_11.q,auto_sortmerge_join_3.q,mapjoin_test_outer.q,vectorization_9.q,merge2.q,groupby6_noskew.q,auto_join_without_localtask.q,multi_join_union.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=111)
	[join_cond_pushdown_unqual4.q,union_remove_7.q,join13.q,join_vc.q,groupby_cube1.q,parquet_vectorization_2.q,bucket_map_join_spark2.q,sample3.q,smb_mapjoin_19.q,union23.q,union.q,union31.q,cbo_udf_udaf.q,ptf_decimal.q,bucketmapjoin2.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=112)
	[parallel_join1.q,union27.q,union12.q,groupby7_map_multi_single_reducer.q,varchar_join1.q,join7.q,join_reorder4.q,skewjoinopt2.q,bucketsortoptimize_insert_2.q,smb_mapjoin_17.q,script_env_var1.q,groupby7_map.q,bucketsortoptimize_insert_8.q,stats16.q,union20.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=114)
	[groupby_map_ppr.q,nullgroup4_multi_distinct.q,join_rc.q,union14.q,order2.q,smb_mapjoin_12.q,vector_cast_constant.q,union_remove_4.q,parquet_vectorization_1.q,auto_join11.q,udaf_collect_set.q,vectorization_12.q,groupby_sort_skew_1_23.q,smb_mapjoin_25.q,skewjoinopt12.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=115)
	[skewjoinopt15.q,auto_join18.q,list_bucket_dml_2.q,input1_limit.q,load_dyn_part3.q,union_remove_14.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,union10.q,bucket_map_join_tez2.q,groupby5_map_skew.q,load_dyn_part7.q,join_reorder.q,bucketmapjoin8.q,union34.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=116)
	[avro_joins.q,parquet_vectorization_8.q,auto_join14.q,vectorization_14.q,auto_join26.q,stats1.q,cbo_stats.q,union22.q,union_view.q,subquery_views.q,smb_mapjoin_22.q,stats15.q,ptf_matchpath.q,transform_ppr1.q,sample1.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=117)
	[limit_pushdown2.q,leftsemijoin_mr.q,parquet_vectorization_0.q,skewjoinopt16.q,bucket3.q,skewjoinopt13.q,auto_sortmerge_join_6.q,bucketmapjoin9.q,auto_join15.q,union_remove_24.q,join22.q,sample4.q,multi_insert_gby.q,join33.q,join_cond_pushdown_unqual2.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=118)
	[vector_decimal_aggregate.q,skewjoin_noskew.q,ppd_join3.q,auto_join23.q,join10.q,union_ppr.q,subquery_multi.q,join32.q,input18.q,cbo_simple_select.q,ptf.q,vectorized_nested_mapjoin.q,union18.q,groupby1.q,join_reorder2.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=120)
	[skewjoinopt3.q,skewjoinopt19.q,timestamp_comparison.q,bucketmapjoin_negative.q,union5.q,insert_into1.q,vectorization_4.q,parquet_vectorization_10.q,vector_left_outer_join.q,decimal_1_1.q,semijoin.q,skewjoinopt9.q,smb_mapjoin_3.q,stats10.q,rcfile_bigdata.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=121)
	[parquet_vectorization_limit.q,multi_insert_mixed.q,smb_mapjoin_4.q,join_cond_pushdown_3.q,insert1.q,union_remove_10.q,mapreduce2.q,udf_in_file.q,skewjoinopt5.q,auto_join12.q,skewjoin.q,vectorization_part_project.q,vector_count_distinct.q,nullgroup4.q,parquet_vectorization_12.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=124)
	[auto_join30.q,date_udf.q,parquet_vectorization_9.q,subquery_shared_alias.q,join16.q,bucketmapjoin7.q,subquery_nested_subquery.q,smb_mapjoin_18.q,join19.q,vector_varchar_4.q,parquet_vectorization_decimal_date.q,union6.q,cbo_subq_in.q,vectorization_part.q,vectorized_timestamp_funcs.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=125)
	[union_remove_1.q,ppd_outer_join2.q,groupby1_noskew.q,join20.q,parquet_vectorization_offset_limit.q,smb_mapjoin_13.q,groupby_rollup1.q,temp_table_gb1.q,bucket7.q,vector_string_concat.q,smb_mapjoin_6.q,metadata_only_queries.q,auto_sortmerge_join_12.q,groupby3_map_multi_distinct.q,parquet_vectorization_pushdown.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=128)
	[load_dyn_part15.q,explaindenpendencydiffengs.q,transform2.q,groupby5.q,cbo_semijoin.q,bucketmapjoin13.q,alter_merge_stats_orc.q,subquery_scalar.q,union_remove_2.q,groupby_position.q,join12.q,smb_mapjoin_8.q,subquery_select.q,join21.q,auto_join16.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=131)
	[bucketmapjoin3.q,parquet_vectorization_nested_udf.q,union_date.q,cbo_gby.q,auto_join31.q,auto_sortmerge_join_1.q,join_cond_pushdown_unqual1.q,ppd_outer_join3.q,bucket_map_join_spark3.q,union28.q,statsfs.q,escape_sortby1.q,vectorization_input_format_excludes.q,leftsemijoin.q,union_remove_6.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=133)
	[groupby_map_ppr_multi_distinct.q,vectorization_13.q,mapjoin_mapjoin.q,union2.q,groupby8_map.q,vectorization_short_regress.q,identity_project_remove_skip.q,stats5.q,groupby8_map_skew.q,nullgroup2.q,mapjoin_subquery.q,bucket2.q,smb_mapjoin_1.q,spark_union_merge.q,union_remove_8.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=134)
	[join39.q,bucketsortoptimize_insert_7.q,vector_distinct_2.q,parquet_vectorization_13.q,bucketmapjoin10.q,join11.q,join41.q,cbo_subq_not_in.q,windowing.q,join40.q,skewjoinopt7.q,stats7.q,annotate_stats_join.q,multi_insert_lateral_view.q,join_1to1.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=135)
	[timestamp_lazy.q,union29.q,runtime_skewjoin_mapjoin_spark.q,auto_join22.q,union13.q,groupby5_map.q,auto_sortmerge_join_16.q,auto_join29.q,groupby6.q,merge1.q,spark_combine_equivalent_work_2.q,union_remove_3.q,multi_insert_move_tasks_share_dependencies.q,ptf_streaming.q,join_array.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=136)
	[ppd_join4.q,vectorization_5.q,smb_mapjoin_2.q,union8.q,ppd_join_filter.q,column_access_stats.q,stats0.q,vector_between_in.q,mapjoin_distinct.q,vector_decimal_mapjoin.q,sample5.q,bucket_map_join_2.q,temp_table_join1.q,vectorized_case.q,stats_noscan_1.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=137)
	[load_dyn_part2.q,groupby3_map_skew.q,smb_mapjoin_7.q,join_cond_pushdown_2.q,groupby7_noskew_multi_single_reducer.q,vectorized_string_funcs.q,vectorization_1.q,groupby4_map_skew.q,auto_smb_mapjoin_14.q,script_env_var2.q,groupby_ppr_multi_distinct.q,pcr.q,auto_join_filters.q,join0.q,join37.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=138)
	[stats12.q,groupby4.q,union_top_level.q,stats2.q,groupby10.q,groupby4_noskew.q,mapjoin_filter_on_outerjoin.q,union19.q,union24.q,union_remove_5.q,union3.q,groupby_multi_single_reducer.q,smb_mapjoin_14.q,groupby3_noskew_multi_distinct.q,union_remove_21.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=139)
	[auto_sortmerge_join_13.q,join4.q,udf_percentile.q,join_reorder3.q,subquery_in.q,auto_join19.q,lateral_view_multi_lateralviews.q,stats14.q,auto_sortmerge_join_4.q,load_dyn_part4.q,vectorization_15.q,vectorized_ptf.q,auto_join2.q,groupby1_map_skew.q,stats18.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=141)
	[groupby_complex_types.q,groupby3_map.q,multigroupby_singlemr.q,union11.q,groupby7.q,bucketmapjoin_negative2.q,bucket_map_join_spark1.q,vectorization_div0.q,union_script.q,union_remove_17.q,auto_join_nulls.q,metadata_only_queries_with_filters.q,union25.q,load_dyn_part13.q,auto_sortmerge_join_9.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=142)
	[subquery_null_agg.q,bucketmapjoin11.q,auto_join4.q,mapjoin_decimal.q,join34.q,parquet_vectorization_5.q,join5.q,sort.q,auto_join28.q,join17.q,add_part_multiple.q,limit_pushdown.q,uniquejoin.q,groupby1_map.q,subquery_notin.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=143)
	[table_access_keys_stats.q,union_remove_9.q,parquet_vectorization_part_varchar.q,nullgroup.q,parquet_vectorization_part.q,mergejoins_mixed.q,join_nullsafe.q,stats8.q,skewjoinopt14.q,union17.q,vectorized_shufflejoin.q,groupby8_noskew.q,groupby11.q,skewjoinopt11.q,load_dyn_part11.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=148)
	[groupby2_noskew_multi_distinct.q,load_dyn_part12.q,scriptfile1.q,auto_join17.q,subquery_multiinsert.q,join_hive_626.q,tez_join_tests.q,parquet_vectorization_16.q,auto_join21.q,join_view.q,join28.q,join_cond_pushdown_4.q,vectorization_0.q,union_null.q,auto_join3.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=149)
	[union_remove_15.q,bucket_map_join_tez1.q,groupby7_noskew.q,bucketmapjoin1.q,parquet_vectorization_7.q,auto_join8.q,auto_join6.q,groupby2_map_skew.q,lateral_view_explode2.q,load_dyn_part1.q,skewjoinopt17.q,skewjoin_union_remove_1.q,auto_join32.q,union_remove_20.q,bucketmapjoin5.q]
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/14770/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14770/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14770/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 30 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12947008 - PreCommit-HIVE-Build

> Improve record and memory usage logging in SparkRecordHandler
> -------------------------------------------------------------
>
>                 Key: HIVE-20512
>                 URL: https://issues.apache.org/jira/browse/HIVE-20512
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Bharathkrishna Guruvayoor Murali
>            Priority: Major
>         Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, HIVE-20512.3.patch, HIVE-20512.4.patch,
HIVE-20512.5.patch, HIVE-20512.6.patch, HIVE-20512.7.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but we should
improve the methodology for how frequently we log this info. Currently we use the following
code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
>     // A very simple counter to keep track of number of rows processed by the
>     // reducer. It dumps
>     // every 1 million times, and quickly before that
>     if (currentThreshold >= 1000000) {
>       return currentThreshold + 1000000;
>     }
>     return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you have to process
a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would help in debugging
tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message