Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2335618D8C for ; Sun, 24 May 2015 01:50:13 +0000 (UTC) Received: (qmail 82922 invoked by uid 500); 24 May 2015 01:50:11 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 82851 invoked by uid 500); 24 May 2015 01:50:11 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 82836 invoked by uid 99); 24 May 2015 01:50:11 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 May 2015 01:50:11 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 896561CC3D2; Sun, 24 May 2015 01:50:10 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============3541363232690031661==" MIME-Version: 1.0 Subject: Re: Review Request 34576: Bucketized Table feature fails in some cases From: "Xuefu Zhang" To: "John Pullokkaran" Cc: "pengcheng xiong" , "Xuefu Zhang" , "hive" Date: Sun, 24 May 2015 01:50:10 -0000 Message-ID: <20150524015010.17212.8414@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: "Xuefu Zhang" X-ReviewGroup: hive X-ReviewRequest-URL: https://reviews.apache.org/r/34576/ X-Sender: "Xuefu Zhang" References: <20150523174740.17474.82205@reviews.apache.org> In-Reply-To: <20150523174740.17474.82205@reviews.apache.org> Reply-To: "Xuefu Zhang" X-ReviewRequest-Repository: hive-git --===============3541363232690031661== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34576/#review85081 ----------------------------------------------------------- could you also link the JIRA number in the review request? ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java nit: remove tab/spacke ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java Warning is proper, but I think the words should say "might" because the source data might be already bucketed and matches the target, in which case, there is no problem. - Xuefu Zhang On May 23, 2015, 5:47 p.m., pengcheng xiong wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/34576/ > ----------------------------------------------------------- > > (Updated May 23, 2015, 5:47 p.m.) > > > Review request for hive and John Pullokkaran. > > > Repository: hive-git > > > Description > ------- > > Bucketized Table feature fails in some cases. if src & destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. > Example > ---------------------------------------------------------------------- > CREATE TABLE P1(key STRING, val STRING) > CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; > LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1; > – perform an insert to make sure there are 2 files > INSERT OVERWRITE TABLE P1 select key, val from P1; > -------------------------------------------------- > This is not a regression. This has never worked. > This got only discovered due to Hadoop2 changes. > In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). > Long term solution seems to be to prevent load data for bucketed table. > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java e53933e > ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 1a9b42b > ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out 623c2e8 > ql/src/test/results/clientnegative/bucket_mapjoin_wrong_table_metadata_1.q.out f4522d2 > ql/src/test/results/clientnegative/bucket_mapjoin_wrong_table_metadata_2.q.out 9aa9b5d > ql/src/test/results/clientnegative/exim_11_nonpart_noncompat_sorting.q.out 9220c8e > ql/src/test/results/clientpositive/auto_join32.q.out bfc8be8 > ql/src/test/results/clientpositive/auto_join_filters.q.out a6720d9 > ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 383defd > ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out e6e7ef3 > ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out e9fb705 > ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out c089419 > ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out 6e443fa > ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out feaea04 > ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out f64ecf0 > ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out e89f548 > ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 44c037f > ql/src/test/results/clientpositive/bucket_map_join_1.q.out d778203 > ql/src/test/results/clientpositive/bucket_map_join_2.q.out aef77aa > ql/src/test/results/clientpositive/bucket_map_join_spark1.q.out 870ecdd > ql/src/test/results/clientpositive/bucket_map_join_spark2.q.out 33f5c46 > ql/src/test/results/clientpositive/bucket_map_join_spark3.q.out 067d1ff > ql/src/test/results/clientpositive/bucketcontext_1.q.out 77bfcf9 > ql/src/test/results/clientpositive/bucketcontext_2.q.out a9db13d > ql/src/test/results/clientpositive/bucketcontext_3.q.out 9ba3e0c > ql/src/test/results/clientpositive/bucketcontext_4.q.out a2b37a8 > ql/src/test/results/clientpositive/bucketcontext_5.q.out 3ee1f0e > ql/src/test/results/clientpositive/bucketcontext_6.q.out d2304fa > ql/src/test/results/clientpositive/bucketcontext_7.q.out 1a105ed > ql/src/test/results/clientpositive/bucketcontext_8.q.out 138e415 > ql/src/test/results/clientpositive/bucketizedhiveinputformat_auto.q.out 215efdd > ql/src/test/results/clientpositive/bucketmapjoin1.q.out 72f2a07 > ql/src/test/results/clientpositive/bucketmapjoin10.q.out b0e849d > ql/src/test/results/clientpositive/bucketmapjoin11.q.out 4263cab > ql/src/test/results/clientpositive/bucketmapjoin12.q.out bcd7394 > ql/src/test/results/clientpositive/bucketmapjoin2.q.out a8d9e9d > ql/src/test/results/clientpositive/bucketmapjoin3.q.out c759f05 > ql/src/test/results/clientpositive/bucketmapjoin4.q.out f61500c > ql/src/test/results/clientpositive/bucketmapjoin5.q.out 0cb2825 > ql/src/test/results/clientpositive/bucketmapjoin7.q.out 667a9db > ql/src/test/results/clientpositive/bucketmapjoin8.q.out 252b377 > ql/src/test/results/clientpositive/bucketmapjoin9.q.out 5e28dc3 > ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out 6ae127d > ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out 4c9f54a > ql/src/test/results/clientpositive/bucketmapjoin_negative3.q.out 9a0bfc4 > ql/src/test/results/clientpositive/groupby_sort_1_23.q.out 34cd1ff > ql/src/test/results/clientpositive/groupby_sort_2.q.out b5e52f1 > ql/src/test/results/clientpositive/groupby_sort_3.q.out c16911a > ql/src/test/results/clientpositive/groupby_sort_4.q.out a6b1c3d > ql/src/test/results/clientpositive/groupby_sort_5.q.out 369e2b5 > ql/src/test/results/clientpositive/groupby_sort_7.q.out 7264695 > ql/src/test/results/clientpositive/groupby_sort_8.q.out ec16eb0 > ql/src/test/results/clientpositive/groupby_sort_9.q.out e49781a > ql/src/test/results/clientpositive/groupby_sort_skew_1_23.q.out 0d631ce > ql/src/test/results/clientpositive/groupby_sort_test_1.q.out 8c1765d > ql/src/test/results/clientpositive/insert_orig_table.q.out 5eea74d > ql/src/test/results/clientpositive/insert_values_orig_table.q.out 684cd1b > ql/src/test/results/clientpositive/join_filters.q.out 4f112bd > ql/src/test/results/clientpositive/join_nulls.q.out 46e0170 > ql/src/test/results/clientpositive/mergejoin.q.out cb96ab3 > ql/src/test/results/clientpositive/skewjoin_mapjoin11.q.out dd084e8 > ql/src/test/results/clientpositive/skewjoinopt19.q.out fd43409 > ql/src/test/results/clientpositive/skewjoinopt20.q.out a28e433 > ql/src/test/results/clientpositive/smb_mapjoin_1.q.out 9ab334b > ql/src/test/results/clientpositive/smb_mapjoin_10.q.out ea2fa51 > ql/src/test/results/clientpositive/smb_mapjoin_2.q.out 379dc0d > ql/src/test/results/clientpositive/smb_mapjoin_25.q.out c0a8959 > ql/src/test/results/clientpositive/smb_mapjoin_3.q.out 26fa5d4 > ql/src/test/results/clientpositive/smb_mapjoin_4.q.out 9fc7f93 > ql/src/test/results/clientpositive/smb_mapjoin_5.q.out 6e6882a > ql/src/test/results/clientpositive/smb_mapjoin_7.q.out 82f5804 > ql/src/test/results/clientpositive/spark/auto_join32.q.out 361a968 > ql/src/test/results/clientpositive/spark/auto_join_filters.q.out 8934433 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_1.q.out 09d2692 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_12.q.out 8102ec1 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out 2ea0a65 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_3.q.out 6281929 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 31e9d86 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 3eceb0b > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out ddbca05 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_8.q.out 88d4dcb > ql/src/test/results/clientpositive/spark/bucket_map_join_1.q.out 4e8ce0d > ql/src/test/results/clientpositive/spark/bucket_map_join_2.q.out c0a3c3d > ql/src/test/results/clientpositive/spark/bucket_map_join_spark1.q.out 6230bef > ql/src/test/results/clientpositive/spark/bucket_map_join_spark2.q.out 1a33625 > ql/src/test/results/clientpositive/spark/bucket_map_join_spark3.q.out fed923c > ql/src/test/results/clientpositive/spark/bucket_map_join_tez1.q.out 65bded2 > ql/src/test/results/clientpositive/spark/bucket_map_join_tez2.q.out 33e6d63 > ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out 44f4d0c > ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out 678ad54 > ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out 95606f0 > ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out d6c25e4 > ql/src/test/results/clientpositive/spark/bucketmapjoin2.q.out d82480e > ql/src/test/results/clientpositive/spark/bucketmapjoin3.q.out 39552c1 > ql/src/test/results/clientpositive/spark/bucketmapjoin4.q.out ad2762d > ql/src/test/results/clientpositive/spark/bucketmapjoin5.q.out f7c3d4d > ql/src/test/results/clientpositive/spark/bucketmapjoin7.q.out 7bfe440 > ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out 4601eb1 > ql/src/test/results/clientpositive/spark/bucketmapjoin9.q.out 60bd103 > ql/src/test/results/clientpositive/spark/bucketmapjoin_negative.q.out 031c46c > ql/src/test/results/clientpositive/spark/bucketmapjoin_negative2.q.out 4a8f46d > ql/src/test/results/clientpositive/spark/bucketmapjoin_negative3.q.out a09904e > ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out cfbce61 > ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 9343805 > ql/src/test/results/clientpositive/spark/skewjoinopt19.q.out eb9bb84 > ql/src/test/results/clientpositive/spark/skewjoinopt20.q.out 22de156 > ql/src/test/results/clientpositive/spark/smb_mapjoin_1.q.out 1ff1262 > ql/src/test/results/clientpositive/spark/smb_mapjoin_10.q.out cadf08e > ql/src/test/results/clientpositive/spark/smb_mapjoin_2.q.out a0d51f3 > ql/src/test/results/clientpositive/spark/smb_mapjoin_25.q.out cb811ed > ql/src/test/results/clientpositive/spark/smb_mapjoin_3.q.out f46b833 > ql/src/test/results/clientpositive/spark/smb_mapjoin_4.q.out a421a42 > ql/src/test/results/clientpositive/spark/smb_mapjoin_5.q.out af65010 > ql/src/test/results/clientpositive/spark/smb_mapjoin_7.q.out 622b950 > ql/src/test/results/clientpositive/stats11.q.out e51f049 > ql/src/test/results/clientpositive/tez/auto_join_filters.q.out 8fde41d > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_1.q.out a275d27 > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_11.q.out 6ac74ca > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_12.q.out 8c8a3bf > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_2.q.out 2cb8416 > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_3.q.out abeceb8 > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_4.q.out 8eb9ce5 > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_5.q.out adcc1fa > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_7.q.out 2562cb0 > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_8.q.out 31b0a97 > ql/src/test/results/clientpositive/tez/bucket_map_join_tez1.q.out 61c197f > ql/src/test/results/clientpositive/tez/bucket_map_join_tez2.q.out 3f980b6 > ql/src/test/results/clientpositive/tez/explainuser_2.q.out f84524b > ql/src/test/results/clientpositive/tez/insert_orig_table.q.out 5eea74d > ql/src/test/results/clientpositive/tez/mergejoin.q.out 97df12a > ql/src/test/results/clientpositive/tez/tez_fsstat.q.out 3fcf68c > ql/src/test/results/clientpositive/tez/tez_smb_1.q.out d970bd9 > ql/src/test/results/clientpositive/tez/tez_smb_main.q.out 6183390 > ql/src/test/results/clientpositive/udaf_percentile_approx_23.q.out 14a6874 > > Diff: https://reviews.apache.org/r/34576/diff/ > > > Testing > ------- > > > Thanks, > > pengcheng xiong > > --===============3541363232690031661==--