Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 123D7173D8 for ; Tue, 28 Oct 2014 03:38:35 +0000 (UTC) Received: (qmail 48565 invoked by uid 500); 28 Oct 2014 03:38:34 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 48484 invoked by uid 500); 28 Oct 2014 03:38:34 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 48469 invoked by uid 500); 28 Oct 2014 03:38:34 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 48466 invoked by uid 99); 28 Oct 2014 03:38:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Oct 2014 03:38:34 +0000 Date: Tue, 28 Oct 2014 03:38:34 +0000 (UTC) From: "Hive QA (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-8202) Support SMB Join for Hive on Spark [Spark Branch] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186297#comment-14186297 ] Hive QA commented on HIVE-8202: ------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12677502/HIVE-8202.3-spark.patch {color:red}ERROR:{color} -1 due to 124 failed/errored test(s), 7037 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_tez_smb_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join38 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_merge_multi_expressions org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_vc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_distinct org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_filter_on_outerjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_test_outer org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapreduce1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapreduce2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_merge1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_merge2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_lateral_view org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_mixed org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_move_tasks_share_dependencies org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multigroupby_singlemr org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_order org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parallel org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parallel_join0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_pcr org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_gby_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_transform org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sample10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sample6 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_script_pipe org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_semijoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin_noskew org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt9 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_17 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_20 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_21 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_25 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_6 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_sort_merge_join_desc_8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_temp_table org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_tez_join_tests org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_tez_joins_explain org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_transform_ppr1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_transform_ppr2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union18 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union19 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union23 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union25 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union28 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union30 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union33 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union6 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_ppr org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_18 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_19 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_20 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_21 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_24 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_9 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_between_in org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_count_distinct org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_data_types org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_decimal_aggregate org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_left_outer_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_orderby_5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_9 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_div0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_part_project org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_pushdown org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_mapjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_nested_mapjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_shufflejoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_timestamp_funcs {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/267/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/267/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-267/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 124 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12677502 - PreCommit-HIVE-SPARK-Build > Support SMB Join for Hive on Spark [Spark Branch] > ------------------------------------------------- > > Key: HIVE-8202 > URL: https://issues.apache.org/jira/browse/HIVE-8202 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Szehon Ho > Attachments: HIVE-8202.1-spark.patch, HIVE-8202.2-spark.patch, HIVE-8202.3-spark.patch, Hive on Spark SMB Join.docx, Hive on Spark SMB Join.pdf > > > SMB joins are used wherever the tables are sorted and bucketed. It's a reduce-side join. The join boils down to just merging the already sorted tables, allowing this operation to be faster than an ordinary map-join. However, if the tables are partitioned, there could be a slow down as each mapper would need to get a very small chunk of a partition which has a single key. Thus, in some scenarios it's beneficial to convert SMB join to SMB map join as well. > The task is to research and support the conversion from regular SMB join to SMB map join for Spark execution engine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)