Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8E8851779D for ; Wed, 27 May 2015 04:53:56 +0000 (UTC) Received: (qmail 63381 invoked by uid 500); 27 May 2015 04:53:56 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 63308 invoked by uid 500); 27 May 2015 04:53:56 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 63290 invoked by uid 99); 27 May 2015 04:53:55 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 May 2015 04:53:55 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id A947F1DD91C; Wed, 27 May 2015 04:53:54 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============2814518955762438683==" MIME-Version: 1.0 Subject: Re: Review Request 34576: Bucketized Table feature fails in some cases From: "Gopal V" To: "John Pullokkaran" Cc: "pengcheng xiong" , "Xuefu Zhang" , "hive" , "Gopal V" Date: Wed, 27 May 2015 04:53:54 -0000 Message-ID: <20150527045354.9497.1189@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: "Gopal V" X-ReviewGroup: hive X-ReviewRequest-URL: https://reviews.apache.org/r/34576/ X-Sender: "Gopal V" References: <20150524020316.17474.18629@reviews.apache.org> In-Reply-To: <20150524020316.17474.18629@reviews.apache.org> Reply-To: "Gopal V" X-ReviewRequest-Repository: hive-git --===============2814518955762438683== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit > On May 24, 2015, 2:03 a.m., Xuefu Zhang wrote: > > Have you thought of what if the client is not interactive, such as JDBC or thrift? > > pengcheng xiong wrote: > I am sorry that we have not thought about it yet. We admitted that the patch will not cover the case when the client is not interactive. Do you have any good ideas that you can share with us? Do you think logging this besides printing a waring msg is good enough? Thanks. > > Xuefu Zhang wrote: > There are all kinds of issues with data loading into bucketed tables. While advanced users might be able to load data correctly, I think that's really rare. The data in a bucketed table needs to be generated by Hive. Thefore, I think we should disable "insert into" and "load data into|overwrite" for a bucketed table. We should also disallow external tables for the same reason. > > To allow the advanced user to achieve what they used to do, we can have a flag, such as "hive.enforce.strict.bucketing", which defaults to true. Those users can proceed by turning this off. > > Another option for "insert into" would be supporting appending new data, such as proposed in HIVE-3244. Why would you disable "insert into" bucketed tables? How else would ACID work? - Gopal ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34576/#review85082 ----------------------------------------------------------- On May 23, 2015, 5:47 p.m., pengcheng xiong wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/34576/ > ----------------------------------------------------------- > > (Updated May 23, 2015, 5:47 p.m.) > > > Review request for hive and John Pullokkaran. > > > Repository: hive-git > > > Description > ------- > > Bucketized Table feature fails in some cases. if src & destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. > Example > ---------------------------------------------------------------------- > CREATE TABLE P1(key STRING, val STRING) > CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; > LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1; > – perform an insert to make sure there are 2 files > INSERT OVERWRITE TABLE P1 select key, val from P1; > -------------------------------------------------- > This is not a regression. This has never worked. > This got only discovered due to Hadoop2 changes. > In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). > Long term solution seems to be to prevent load data for bucketed table. > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java e53933e > ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 1a9b42b > ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out 623c2e8 > ql/src/test/results/clientnegative/bucket_mapjoin_wrong_table_metadata_1.q.out f4522d2 > ql/src/test/results/clientnegative/bucket_mapjoin_wrong_table_metadata_2.q.out 9aa9b5d > ql/src/test/results/clientnegative/exim_11_nonpart_noncompat_sorting.q.out 9220c8e > ql/src/test/results/clientpositive/auto_join32.q.out bfc8be8 > ql/src/test/results/clientpositive/auto_join_filters.q.out a6720d9 > ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 383defd > ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out e6e7ef3 > ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out e9fb705 > ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out c089419 > ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out 6e443fa > ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out feaea04 > ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out f64ecf0 > ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out e89f548 > ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 44c037f > ql/src/test/results/clientpositive/bucket_map_join_1.q.out d778203 > ql/src/test/results/clientpositive/bucket_map_join_2.q.out aef77aa > ql/src/test/results/clientpositive/bucket_map_join_spark1.q.out 870ecdd > ql/src/test/results/clientpositive/bucket_map_join_spark2.q.out 33f5c46 > ql/src/test/results/clientpositive/bucket_map_join_spark3.q.out 067d1ff > ql/src/test/results/clientpositive/bucketcontext_1.q.out 77bfcf9 > ql/src/test/results/clientpositive/bucketcontext_2.q.out a9db13d > ql/src/test/results/clientpositive/bucketcontext_3.q.out 9ba3e0c > ql/src/test/results/clientpositive/bucketcontext_4.q.out a2b37a8 > ql/src/test/results/clientpositive/bucketcontext_5.q.out 3ee1f0e > ql/src/test/results/clientpositive/bucketcontext_6.q.out d2304fa > ql/src/test/results/clientpositive/bucketcontext_7.q.out 1a105ed > ql/src/test/results/clientpositive/bucketcontext_8.q.out 138e415 > ql/src/test/results/clientpositive/bucketizedhiveinputformat_auto.q.out 215efdd > ql/src/test/results/clientpositive/bucketmapjoin1.q.out 72f2a07 > ql/src/test/results/clientpositive/bucketmapjoin10.q.out b0e849d > ql/src/test/results/clientpositive/bucketmapjoin11.q.out 4263cab > ql/src/test/results/clientpositive/bucketmapjoin12.q.out bcd7394 > ql/src/test/results/clientpositive/bucketmapjoin2.q.out a8d9e9d > ql/src/test/results/clientpositive/bucketmapjoin3.q.out c759f05 > ql/src/test/results/clientpositive/bucketmapjoin4.q.out f61500c > ql/src/test/results/clientpositive/bucketmapjoin5.q.out 0cb2825 > ql/src/test/results/clientpositive/bucketmapjoin7.q.out 667a9db > ql/src/test/results/clientpositive/bucketmapjoin8.q.out 252b377 > ql/src/test/results/clientpositive/bucketmapjoin9.q.out 5e28dc3 > ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out 6ae127d > ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out 4c9f54a > ql/src/test/results/clientpositive/bucketmapjoin_negative3.q.out 9a0bfc4 > ql/src/test/results/clientpositive/groupby_sort_1_23.q.out 34cd1ff > ql/src/test/results/clientpositive/groupby_sort_2.q.out b5e52f1 > ql/src/test/results/clientpositive/groupby_sort_3.q.out c16911a > ql/src/test/results/clientpositive/groupby_sort_4.q.out a6b1c3d > ql/src/test/results/clientpositive/groupby_sort_5.q.out 369e2b5 > ql/src/test/results/clientpositive/groupby_sort_7.q.out 7264695 > ql/src/test/results/clientpositive/groupby_sort_8.q.out ec16eb0 > ql/src/test/results/clientpositive/groupby_sort_9.q.out e49781a > ql/src/test/results/clientpositive/groupby_sort_skew_1_23.q.out 0d631ce > ql/src/test/results/clientpositive/groupby_sort_test_1.q.out 8c1765d > ql/src/test/results/clientpositive/insert_orig_table.q.out 5eea74d > ql/src/test/results/clientpositive/insert_values_orig_table.q.out 684cd1b > ql/src/test/results/clientpositive/join_filters.q.out 4f112bd > ql/src/test/results/clientpositive/join_nulls.q.out 46e0170 > ql/src/test/results/clientpositive/mergejoin.q.out cb96ab3 > ql/src/test/results/clientpositive/skewjoin_mapjoin11.q.out dd084e8 > ql/src/test/results/clientpositive/skewjoinopt19.q.out fd43409 > ql/src/test/results/clientpositive/skewjoinopt20.q.out a28e433 > ql/src/test/results/clientpositive/smb_mapjoin_1.q.out 9ab334b > ql/src/test/results/clientpositive/smb_mapjoin_10.q.out ea2fa51 > ql/src/test/results/clientpositive/smb_mapjoin_2.q.out 379dc0d > ql/src/test/results/clientpositive/smb_mapjoin_25.q.out c0a8959 > ql/src/test/results/clientpositive/smb_mapjoin_3.q.out 26fa5d4 > ql/src/test/results/clientpositive/smb_mapjoin_4.q.out 9fc7f93 > ql/src/test/results/clientpositive/smb_mapjoin_5.q.out 6e6882a > ql/src/test/results/clientpositive/smb_mapjoin_7.q.out 82f5804 > ql/src/test/results/clientpositive/spark/auto_join32.q.out 361a968 > ql/src/test/results/clientpositive/spark/auto_join_filters.q.out 8934433 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_1.q.out 09d2692 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_12.q.out 8102ec1 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out 2ea0a65 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_3.q.out 6281929 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 31e9d86 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 3eceb0b > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out ddbca05 > ql/src/test/results/clientpositive/spark/auto_sortmerge_join_8.q.out 88d4dcb > ql/src/test/results/clientpositive/spark/bucket_map_join_1.q.out 4e8ce0d > ql/src/test/results/clientpositive/spark/bucket_map_join_2.q.out c0a3c3d > ql/src/test/results/clientpositive/spark/bucket_map_join_spark1.q.out 6230bef > ql/src/test/results/clientpositive/spark/bucket_map_join_spark2.q.out 1a33625 > ql/src/test/results/clientpositive/spark/bucket_map_join_spark3.q.out fed923c > ql/src/test/results/clientpositive/spark/bucket_map_join_tez1.q.out 65bded2 > ql/src/test/results/clientpositive/spark/bucket_map_join_tez2.q.out 33e6d63 > ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out 44f4d0c > ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out 678ad54 > ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out 95606f0 > ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out d6c25e4 > ql/src/test/results/clientpositive/spark/bucketmapjoin2.q.out d82480e > ql/src/test/results/clientpositive/spark/bucketmapjoin3.q.out 39552c1 > ql/src/test/results/clientpositive/spark/bucketmapjoin4.q.out ad2762d > ql/src/test/results/clientpositive/spark/bucketmapjoin5.q.out f7c3d4d > ql/src/test/results/clientpositive/spark/bucketmapjoin7.q.out 7bfe440 > ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out 4601eb1 > ql/src/test/results/clientpositive/spark/bucketmapjoin9.q.out 60bd103 > ql/src/test/results/clientpositive/spark/bucketmapjoin_negative.q.out 031c46c > ql/src/test/results/clientpositive/spark/bucketmapjoin_negative2.q.out 4a8f46d > ql/src/test/results/clientpositive/spark/bucketmapjoin_negative3.q.out a09904e > ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out cfbce61 > ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 9343805 > ql/src/test/results/clientpositive/spark/skewjoinopt19.q.out eb9bb84 > ql/src/test/results/clientpositive/spark/skewjoinopt20.q.out 22de156 > ql/src/test/results/clientpositive/spark/smb_mapjoin_1.q.out 1ff1262 > ql/src/test/results/clientpositive/spark/smb_mapjoin_10.q.out cadf08e > ql/src/test/results/clientpositive/spark/smb_mapjoin_2.q.out a0d51f3 > ql/src/test/results/clientpositive/spark/smb_mapjoin_25.q.out cb811ed > ql/src/test/results/clientpositive/spark/smb_mapjoin_3.q.out f46b833 > ql/src/test/results/clientpositive/spark/smb_mapjoin_4.q.out a421a42 > ql/src/test/results/clientpositive/spark/smb_mapjoin_5.q.out af65010 > ql/src/test/results/clientpositive/spark/smb_mapjoin_7.q.out 622b950 > ql/src/test/results/clientpositive/stats11.q.out e51f049 > ql/src/test/results/clientpositive/tez/auto_join_filters.q.out 8fde41d > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_1.q.out a275d27 > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_11.q.out 6ac74ca > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_12.q.out 8c8a3bf > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_2.q.out 2cb8416 > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_3.q.out abeceb8 > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_4.q.out 8eb9ce5 > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_5.q.out adcc1fa > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_7.q.out 2562cb0 > ql/src/test/results/clientpositive/tez/auto_sortmerge_join_8.q.out 31b0a97 > ql/src/test/results/clientpositive/tez/bucket_map_join_tez1.q.out 61c197f > ql/src/test/results/clientpositive/tez/bucket_map_join_tez2.q.out 3f980b6 > ql/src/test/results/clientpositive/tez/explainuser_2.q.out f84524b > ql/src/test/results/clientpositive/tez/insert_orig_table.q.out 5eea74d > ql/src/test/results/clientpositive/tez/mergejoin.q.out 97df12a > ql/src/test/results/clientpositive/tez/tez_fsstat.q.out 3fcf68c > ql/src/test/results/clientpositive/tez/tez_smb_1.q.out d970bd9 > ql/src/test/results/clientpositive/tez/tez_smb_main.q.out 6183390 > ql/src/test/results/clientpositive/udaf_percentile_approx_23.q.out 14a6874 > > Diff: https://reviews.apache.org/r/34576/diff/ > > > Testing > ------- > > > Thanks, > > pengcheng xiong > > --===============2814518955762438683==--