hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10550) Dynamic RDD caching optimization for HoS.[Spark Branch]
Date Wed, 13 May 2015 15:15:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542079#comment-14542079
] 

Hive QA commented on HIVE-10550:
--------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12732563/HIVE-10550.1-spark.patch

{color:red}ERROR:{color} -1 due to 125 failed/errored test(s), 8721 tests executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did not produce
a TEST-*.xml file
TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not produce a TEST-*.xml
file
TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more - did
not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more - did
not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more -
did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more - did
not produce a TEST-*.xml file
TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more
- did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_gby_empty
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_union
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_column_access_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_date_udf
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby7_map
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby7_map_skew
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby7_noskew
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby8_map_skew
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_complex_types
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_cube1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_insert_common_distinct
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_position
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_rollup1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_sort_1_23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_sort_skew_1_23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_input1_limit
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_insert_into3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join34
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_filters_overlap
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_nullsafe
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_vc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_lateral_view
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_mixed
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_move_tasks_share_dependencies
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_join_union
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multigroupby_singlemr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_table_access_keys_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_temp_table
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union19
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union21
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union22
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union24
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union25
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union26
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union27
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union28
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union29
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union30
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union32
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union33
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union34
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_date
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_date_trim
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_lateralview
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_script
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_top_level
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_view
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_windowing
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/853/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/853/console
Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-853/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 125 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12732563 - PreCommit-HIVE-SPARK-Build

> Dynamic RDD caching optimization for HoS.[Spark Branch]
> -------------------------------------------------------
>
>                 Key: HIVE-10550
>                 URL: https://issues.apache.org/jira/browse/HIVE-10550
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>         Attachments: HIVE-10550.1-spark.patch, HIVE-10550.1.patch
>
>
> A Hive query may try to scan the same table multi times, like self-join, self-union,
or even share the same subquery, [TPC-DS Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql]
is an example. As you may know that, Spark support cache RDD data, which mean Spark would
put the calculated RDD data in memory and get the data from memory directly for next time,
this avoid the calculation cost of this RDD(and all the cost of its dependencies) at the cost
of more memory usage. Through analyze the query context, we should be able to understand which
part of query could be shared, so that we can reuse the cached RDD in the generated Spark
job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message