Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E942F17CA6 for ; Tue, 28 Oct 2014 22:32:34 +0000 (UTC) Received: (qmail 42662 invoked by uid 500); 28 Oct 2014 22:32:34 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 42593 invoked by uid 500); 28 Oct 2014 22:32:34 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 42551 invoked by uid 99); 28 Oct 2014 22:32:34 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Oct 2014 22:32:34 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 9F5661DF743; Tue, 28 Oct 2014 22:32:38 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============1984099480357880247==" MIME-Version: 1.0 Subject: Re: Review Request 27265: Support SMB Join for Hive on Spark [Spark Branch] From: "Szehon Ho" To: "Szehon Ho" , "Jimmy Xiang" , "hive" , "Suhas Satish" Date: Tue, 28 Oct 2014 22:32:38 -0000 Message-ID: <20141028223238.7138.44722@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Szehon Ho" X-ReviewGroup: hive X-ReviewRequest-URL: https://reviews.apache.org/r/27265/ X-Sender: "Szehon Ho" References: <20141028022029.7138.46006@reviews.apache.org> In-Reply-To: <20141028022029.7138.46006@reviews.apache.org> Reply-To: "Szehon Ho" X-ReviewRequest-Repository: hive-git --===============1984099480357880247== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27265/ ----------------------------------------------------------- (Updated Oct. 28, 2014, 10:32 p.m.) Review request for hive. Changes ------- Fix tests and address review comments. Repository: hive-git Description ------- This change re-uses the SMBJoinOperator for Spark. Background: the logical layer already converts joins to SMB Joins. This changes just introduces a class called "SparkSortMergeJoinFactory" on the Spark-compile path which attaches the data structures (like local work, bucket info) to the MapWork for the SMBJoinOperator to consume. It is largely-based on the MapReduce class "MapJoinFactory". However, in spark-path, it is activated only for SMBJoin and not map-joins, as we have another strategy for map-joins. That is why there's a new optimizer-rule called "TypeRule", so this processor is only run on SMBJoinOperators (which share same name with MapJoinOperators, which is needed for logical-optimizers dealing with hints). One major assumption around the whole SMB concept is that both tables have corresponding buckets. I found during testing of large numbers of buckets (like auto_sortmerge_join_16) that "insert" into a bucketed table wasn't putting the same keys in corresponding buckets. I activated MR-style shuffle (hash-shuffle instead of total-order shuffle), and that seemed to solve the issue. Diffs (updated) ----- itests/src/test/resources/testconfiguration.properties 00c9f4d ql/src/java/org/apache/hadoop/hive/ql/lib/TypeRule.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ae1d1ab ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java ed88c60 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 8e28887 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 4f5feca ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 1c663c4 ql/src/test/queries/clientpositive/parallel_join0.q 5180947 ql/src/test/results/clientpositive/spark/auto_join0.q.out 76ff63d ql/src/test/results/clientpositive/spark/auto_join10.q.out 05a5912 ql/src/test/results/clientpositive/spark/auto_join11.q.out 998c28b ql/src/test/results/clientpositive/spark/auto_join12.q.out d2b7993 ql/src/test/results/clientpositive/spark/auto_join13.q.out 78aa01e ql/src/test/results/clientpositive/spark/auto_join15.q.out 5916070 ql/src/test/results/clientpositive/spark/auto_join16.q.out 0b6807d ql/src/test/results/clientpositive/spark/auto_join18.q.out 6083b38 ql/src/test/results/clientpositive/spark/auto_join18_multi_distinct.q.out 01c8f0a ql/src/test/results/clientpositive/spark/auto_join20.q.out a8f2b9a ql/src/test/results/clientpositive/spark/auto_join21.q.out f9ac35d ql/src/test/results/clientpositive/spark/auto_join22.q.out 516322c ql/src/test/results/clientpositive/spark/auto_join23.q.out ce5a670 ql/src/test/results/clientpositive/spark/auto_join24.q.out 15b8888 ql/src/test/results/clientpositive/spark/auto_join27.q.out 67f5739 ql/src/test/results/clientpositive/spark/auto_join28.q.out b979661 ql/src/test/results/clientpositive/spark/auto_join29.q.out 0951b8d ql/src/test/results/clientpositive/spark/auto_join30.q.out 98b3974 ql/src/test/results/clientpositive/spark/auto_join31.q.out df502c8 ql/src/test/results/clientpositive/spark/auto_join32.q.out 8d83188 ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out e64d4fb ql/src/test/results/clientpositive/spark/auto_sortmerge_join_1.q.out 9158d65 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_10.q.out f608cc5 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_11.q.out 3c26363 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_12.q.out 65e496f ql/src/test/results/clientpositive/spark/auto_sortmerge_join_13.q.out a5a281b ql/src/test/results/clientpositive/spark/auto_sortmerge_join_14.q.out 2fc3bb6 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_15.q.out 74cbd7c ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out d1bb7a0 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_3.q.out d57a1d7 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 8244c50 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 2ab1bca ql/src/test/results/clientpositive/spark/auto_sortmerge_join_6.q.out bc4a163 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out 16ef3ae ql/src/test/results/clientpositive/spark/auto_sortmerge_join_8.q.out 9fd3e5a ql/src/test/results/clientpositive/spark/auto_sortmerge_join_9.q.out a7f994f ql/src/test/results/clientpositive/spark/bucket2.q.out b1b2997 ql/src/test/results/clientpositive/spark/bucket3.q.out 019c11a ql/src/test/results/clientpositive/spark/bucket4.q.out 2cbab11 ql/src/test/results/clientpositive/spark/bucket_map_join_1.q.out 4ec619e ql/src/test/results/clientpositive/spark/bucket_map_join_2.q.out 1c288c2 ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out 8be3edd ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out 9e45843 ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out 0c1ac4b ql/src/test/results/clientpositive/spark/bucketmapjoin13.q.out dc1b8cf ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out 6d72fdf ql/src/test/results/clientpositive/spark/bucketmapjoin9.q.out d80bdcf ql/src/test/results/clientpositive/spark/count.q.out c527c1d ql/src/test/results/clientpositive/spark/ctas.q.out 0ded266 ql/src/test/results/clientpositive/spark/disable_merge_for_bucketing.q.out 590b265 ql/src/test/results/clientpositive/spark/escape_clusterby1.q.out 52bdf6a ql/src/test/results/clientpositive/spark/escape_distributeby1.q.out 736db5e ql/src/test/results/clientpositive/spark/escape_orderby1.q.out 6e1c0cf ql/src/test/results/clientpositive/spark/escape_sortby1.q.out 58b663c ql/src/test/results/clientpositive/spark/groupby1.q.out 847f45c ql/src/test/results/clientpositive/spark/groupby10.q.out 2095843 ql/src/test/results/clientpositive/spark/groupby11.q.out 70db5a5 ql/src/test/results/clientpositive/spark/groupby2.q.out 86e2f2a ql/src/test/results/clientpositive/spark/groupby3.q.out 13a5fab ql/src/test/results/clientpositive/spark/groupby3_map.q.out dac2824 ql/src/test/results/clientpositive/spark/groupby3_map_multi_distinct.q.out d2c054a ql/src/test/results/clientpositive/spark/groupby3_map_skew.q.out ec6439a ql/src/test/results/clientpositive/spark/groupby3_noskew.q.out 0c9a7e1 ql/src/test/results/clientpositive/spark/groupby3_noskew_multi_distinct.q.out 42fbb8c ql/src/test/results/clientpositive/spark/groupby4.q.out 318c5a3 ql/src/test/results/clientpositive/spark/groupby7_map.q.out 22a05b5 ql/src/test/results/clientpositive/spark/groupby7_map_multi_single_reducer.q.out bc453c6 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 2a07f2a ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 00a0707 ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out 36640ef ql/src/test/results/clientpositive/spark/groupby8.q.out d8295ce ql/src/test/results/clientpositive/spark/groupby8_map.q.out b9aa597 ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out b9aa597 ql/src/test/results/clientpositive/spark/groupby8_noskew.q.out b9aa597 ql/src/test/results/clientpositive/spark/groupby9.q.out bec2346 ql/src/test/results/clientpositive/spark/groupby_complex_types.q.out 16fadea ql/src/test/results/clientpositive/spark/groupby_complex_types_multi_single_reducer.q.out 7470843 ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 169c4ac ql/src/test/results/clientpositive/spark/groupby_multi_insert_common_distinct.q.out d3457da ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 3abd0e3 ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer2.q.out 7f74c62 ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer3.q.out c4b7419 ql/src/test/results/clientpositive/spark/groupby_position.q.out 9e58189 ql/src/test/results/clientpositive/spark/groupby_ppr.q.out 860aa58 ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 0aeff6b ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 61dd2be ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 99da734 ql/src/test/results/clientpositive/spark/having.q.out 5e9f20d ql/src/test/results/clientpositive/spark/input14.q.out e7d4db6 ql/src/test/results/clientpositive/spark/input17.q.out 0882a29 ql/src/test/results/clientpositive/spark/input18.q.out 802fb0a ql/src/test/results/clientpositive/spark/input1_limit.q.out 33ecd07 ql/src/test/results/clientpositive/spark/insert_into1.q.out e9be658 ql/src/test/results/clientpositive/spark/insert_into2.q.out 5c8e9c7 ql/src/test/results/clientpositive/spark/insert_into3.q.out 6c0111d ql/src/test/results/clientpositive/spark/join0.q.out 55b725e ql/src/test/results/clientpositive/spark/join15.q.out 1651db1 ql/src/test/results/clientpositive/spark/join18.q.out 7b64fb6 ql/src/test/results/clientpositive/spark/join18_multi_distinct.q.out 57c4516 ql/src/test/results/clientpositive/spark/join20.q.out f06ffac ql/src/test/results/clientpositive/spark/join21.q.out e81ec5a ql/src/test/results/clientpositive/spark/join23.q.out 3982ea7 ql/src/test/results/clientpositive/spark/join29.q.out d5383d5 ql/src/test/results/clientpositive/spark/join30.q.out 5c16622 ql/src/test/results/clientpositive/spark/join31.q.out 9193df9 ql/src/test/results/clientpositive/spark/join35.q.out 1750aec ql/src/test/results/clientpositive/spark/join38.q.out cef8a84 ql/src/test/results/clientpositive/spark/join_merge_multi_expressions.q.out 8e924be ql/src/test/results/clientpositive/spark/join_vc.q.out 6e34ef3 ql/src/test/results/clientpositive/spark/limit_pushdown.q.out 94b38f7 ql/src/test/results/clientpositive/spark/load_dyn_part14.q.out 5dd5fad ql/src/test/results/clientpositive/spark/load_dyn_part2.q.out f8f8971 ql/src/test/results/clientpositive/spark/mapjoin_distinct.q.out 9f66974 ql/src/test/results/clientpositive/spark/mapjoin_mapjoin.q.out 1801d13 ql/src/test/results/clientpositive/spark/mapreduce1.q.out 1824126 ql/src/test/results/clientpositive/spark/mapreduce2.q.out 792a0c8 ql/src/test/results/clientpositive/spark/merge1.q.out c50a80b ql/src/test/results/clientpositive/spark/merge2.q.out aec97a3 ql/src/test/results/clientpositive/spark/metadata_only_queries.q.out 7e32dd6 ql/src/test/results/clientpositive/spark/multi_insert.q.out 2b9f90e ql/src/test/results/clientpositive/spark/multi_insert_gby.q.out 7d6d58b ql/src/test/results/clientpositive/spark/multi_insert_gby2.q.out fca3e1d ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out ce78fba ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out bca846a ql/src/test/results/clientpositive/spark/multi_insert_mixed.q.out 819b265 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out 7e768e4 ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 2d0c4d7 ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out d9de8d9 ql/src/test/results/clientpositive/spark/order.q.out 3c5b169 ql/src/test/results/clientpositive/spark/order2.q.out 8999399 ql/src/test/results/clientpositive/spark/parallel.q.out 32d7ff1 ql/src/test/results/clientpositive/spark/parallel_join0.q.out 46a93cc ql/src/test/results/clientpositive/spark/parquet_join.q.out d5a8684 ql/src/test/results/clientpositive/spark/pcr.q.out 4e9244f ql/src/test/results/clientpositive/spark/ppd_gby_join.q.out c5c34c1 ql/src/test/results/clientpositive/spark/ppd_transform.q.out 282277a ql/src/test/results/clientpositive/spark/sample10.q.out c511152 ql/src/test/results/clientpositive/spark/sample6.q.out f6256f5 ql/src/test/results/clientpositive/spark/script_pipe.q.out 5b966ff ql/src/test/results/clientpositive/spark/semijoin.q.out 18fc837 ql/src/test/results/clientpositive/spark/skewjoin.q.out d674d04 ql/src/test/results/clientpositive/spark/skewjoin_noskew.q.out d45cdd3 ql/src/test/results/clientpositive/spark/skewjoinopt1.q.out 47ebe96 ql/src/test/results/clientpositive/spark/skewjoinopt15.q.out e197185 ql/src/test/results/clientpositive/spark/skewjoinopt9.q.out 8188487 ql/src/test/results/clientpositive/spark/smb_mapjoin_13.q.out 0319137 ql/src/test/results/clientpositive/spark/smb_mapjoin_14.q.out cad4063 ql/src/test/results/clientpositive/spark/smb_mapjoin_15.q.out 7849e78 ql/src/test/results/clientpositive/spark/smb_mapjoin_16.q.out 11ffefd ql/src/test/results/clientpositive/spark/smb_mapjoin_17.q.out 482268c ql/src/test/results/clientpositive/spark/smb_mapjoin_20.q.out 292f596 ql/src/test/results/clientpositive/spark/smb_mapjoin_21.q.out 8bc5dd6 ql/src/test/results/clientpositive/spark/smb_mapjoin_25.q.out efa38d4 ql/src/test/results/clientpositive/spark/sort.q.out 04f6c32 ql/src/test/results/clientpositive/spark/sort_merge_join_desc_1.q.out 32c3818 ql/src/test/results/clientpositive/spark/sort_merge_join_desc_2.q.out ae08516 ql/src/test/results/clientpositive/spark/sort_merge_join_desc_3.q.out 6add9f9 ql/src/test/results/clientpositive/spark/sort_merge_join_desc_4.q.out b810a56 ql/src/test/results/clientpositive/spark/sort_merge_join_desc_5.q.out f59d942 ql/src/test/results/clientpositive/spark/sort_merge_join_desc_6.q.out 4085d9a ql/src/test/results/clientpositive/spark/sort_merge_join_desc_7.q.out 28336c5 ql/src/test/results/clientpositive/spark/sort_merge_join_desc_8.q.out 087a89d ql/src/test/results/clientpositive/spark/subquery_in.q.out 323c894 ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2bedd37 ql/src/test/results/clientpositive/spark/temp_table.q.out a126fc7 ql/src/test/results/clientpositive/spark/tez_join_tests.q.out 9254944 ql/src/test/results/clientpositive/spark/tez_joins_explain.q.out d2b23ad ql/src/test/results/clientpositive/spark/transform_ppr1.q.out 5309ade ql/src/test/results/clientpositive/spark/transform_ppr2.q.out 2dc285f ql/src/test/results/clientpositive/spark/union10.q.out 59ebb0c ql/src/test/results/clientpositive/spark/union11.q.out 40e29b8 ql/src/test/results/clientpositive/spark/union14.q.out 5e8fdd8 ql/src/test/results/clientpositive/spark/union15.q.out 1c35ead ql/src/test/results/clientpositive/spark/union16.q.out c35ed10 ql/src/test/results/clientpositive/spark/union18.q.out f1c69bf ql/src/test/results/clientpositive/spark/union19.q.out c86afb0 ql/src/test/results/clientpositive/spark/union2.q.out da8d154 ql/src/test/results/clientpositive/spark/union23.q.out 9e26762 ql/src/test/results/clientpositive/spark/union25.q.out 07ba875 ql/src/test/results/clientpositive/spark/union28.q.out f668ff8 ql/src/test/results/clientpositive/spark/union3.q.out ba21367 ql/src/test/results/clientpositive/spark/union30.q.out ee0daf4 ql/src/test/results/clientpositive/spark/union33.q.out ca08e0c ql/src/test/results/clientpositive/spark/union4.q.out 2e46204 ql/src/test/results/clientpositive/spark/union5.q.out 5512b50 ql/src/test/results/clientpositive/spark/union6.q.out 01f044e ql/src/test/results/clientpositive/spark/union7.q.out b5a693e ql/src/test/results/clientpositive/spark/union9.q.out db14477 ql/src/test/results/clientpositive/spark/union_ppr.q.out 9ed0c86 ql/src/test/results/clientpositive/spark/union_remove_1.q.out 85ec617 ql/src/test/results/clientpositive/spark/union_remove_10.q.out f4601fd ql/src/test/results/clientpositive/spark/union_remove_15.q.out fe2dcd1 ql/src/test/results/clientpositive/spark/union_remove_16.q.out 6902aeb ql/src/test/results/clientpositive/spark/union_remove_18.q.out b901363 ql/src/test/results/clientpositive/spark/union_remove_19.q.out ea6ff41 ql/src/test/results/clientpositive/spark/union_remove_2.q.out 8e58c46 ql/src/test/results/clientpositive/spark/union_remove_20.q.out 13085a4 ql/src/test/results/clientpositive/spark/union_remove_21.q.out 2cc70c5 ql/src/test/results/clientpositive/spark/union_remove_24.q.out 5913610 ql/src/test/results/clientpositive/spark/union_remove_25.q.out 89b8bac ql/src/test/results/clientpositive/spark/union_remove_4.q.out 909a924 ql/src/test/results/clientpositive/spark/union_remove_5.q.out fb24181 ql/src/test/results/clientpositive/spark/union_remove_6.q.out 73f038b ql/src/test/results/clientpositive/spark/union_remove_7.q.out 49ec685 ql/src/test/results/clientpositive/spark/union_remove_8.q.out e98957d ql/src/test/results/clientpositive/spark/union_remove_9.q.out 66fe4e9 ql/src/test/results/clientpositive/spark/vector_between_in.q.out f0d2ac7 ql/src/test/results/clientpositive/spark/vector_cast_constant.q.out 2dd7aab ql/src/test/results/clientpositive/spark/vector_count_distinct.q.out 8b6a226 ql/src/test/results/clientpositive/spark/vector_data_types.q.out 5758c4b ql/src/test/results/clientpositive/spark/vector_decimal_aggregate.q.out 3c6d561 ql/src/test/results/clientpositive/spark/vector_orderby_5.q.out 3b89885 ql/src/test/results/clientpositive/spark/vectorization_0.q.out 82e4926 ql/src/test/results/clientpositive/spark/vectorization_14.q.out 2d49a0c ql/src/test/results/clientpositive/spark/vectorization_15.q.out f9f4476 ql/src/test/results/clientpositive/spark/vectorization_9.q.out 80a93f4 ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 6f19862 ql/src/test/results/clientpositive/spark/vectorization_part_project.q.out aa87dd9 ql/src/test/results/clientpositive/spark/vectorization_pushdown.q.out a785497 ql/src/test/results/clientpositive/spark/vectorized_mapjoin.q.out 5b9205b ql/src/test/results/clientpositive/spark/vectorized_nested_mapjoin.q.out 1c226fd ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 1d11b30 ql/src/test/results/clientpositive/spark/vectorized_shufflejoin.q.out 5b9205b ql/src/test/results/clientpositive/spark/vectorized_timestamp_funcs.q.out cd43197 Diff: https://reviews.apache.org/r/27265/diff/ Testing ------- Ran existing auto_sortmerge_* tests. Thanks, Szehon Ho --===============1984099480357880247==--