hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8202) Support SMB Join for Hive on Spark [Spark Branch]
Date Tue, 28 Oct 2014 03:38:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186297#comment-14186297
] 

Hive QA commented on HIVE-8202:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12677502/HIVE-8202.3-spark.patch

{color:red}ERROR:{color} -1 due to 124 failed/errored test(s), 7037 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_tez_smb_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join38
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_merge_multi_expressions
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_vc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_distinct
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_filter_on_outerjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_test_outer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapreduce1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapreduce2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_merge1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_merge2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_lateral_view
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_mixed
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_move_tasks_share_dependencies
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multigroupby_singlemr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_order
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parallel
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parallel_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_pcr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_gby_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_transform
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sample10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sample6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_script_pipe
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin_noskew
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_21
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_25
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_temp_table
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_tez_join_tests
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_tez_joins_explain
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_transform_ppr2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union19
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union25
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union28
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union30
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union33
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_19
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_21
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_24
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_between_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_count_distinct
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_data_types
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_left_outer_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_orderby_5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_div0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_part_project
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_pushdown
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_mapjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_nested_mapjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_shufflejoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_timestamp_funcs
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/267/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/267/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-267/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 124 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12677502 - PreCommit-HIVE-SPARK-Build

> Support SMB Join for Hive on Spark [Spark Branch]
> -------------------------------------------------
>
>                 Key: HIVE-8202
>                 URL: https://issues.apache.org/jira/browse/HIVE-8202
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Szehon Ho
>         Attachments: HIVE-8202.1-spark.patch, HIVE-8202.2-spark.patch, HIVE-8202.3-spark.patch,
Hive on Spark SMB Join.docx, Hive on Spark SMB Join.pdf
>
>
> SMB joins are used wherever the tables are sorted and bucketed. It's a reduce-side join.
The join boils down to just merging the already sorted tables, allowing this operation to
be faster than an ordinary map-join. However, if the tables are partitioned, there could be
a slow down as each mapper would need to get a very small chunk of a partition which has a
single key. Thus, in some scenarios it's beneficial to convert SMB join to SMB map join as
well.
> The task is to research and support the conversion from regular SMB join to SMB map join
for Spark execution engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message