Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AF53D200B5A for ; Thu, 4 Aug 2016 18:29:33 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id AC7E0160AAB; Thu, 4 Aug 2016 16:29:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F0E13160A6A for ; Thu, 4 Aug 2016 18:29:32 +0200 (CEST) Received: (qmail 21639 invoked by uid 500); 4 Aug 2016 16:29:31 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 21618 invoked by uid 99); 4 Aug 2016 16:29:31 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Aug 2016 16:29:31 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 753EA2BF134; Thu, 4 Aug 2016 16:29:29 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============8251973988135829773==" MIME-Version: 1.0 Subject: Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3 From: Sergio Pena To: hive , Reuben Kuhnert , Thomas Poepping , Ashutosh Chauhan , Lefty Leverenz , Sergio Pena Date: Thu, 04 Aug 2016 16:29:29 -0000 Message-ID: <20160804162929.29918.54553@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Sergio Pena X-ReviewGroup: hive X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/50359/ X-Sender: Sergio Pena References: <20160728201158.11792.91628@reviews.apache.org> In-Reply-To: <20160728201158.11792.91628@reviews.apache.org> X-ReviewBoard-Diff-For: common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java X-ReviewBoard-Diff-For: common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java X-ReviewBoard-Diff-For: ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java Reply-To: Sergio Pena X-ReviewRequest-Repository: hive-git archived-at: Thu, 04 Aug 2016 16:29:33 -0000 --===============8251973988135829773== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50359/ ----------------------------------------------------------- (Updated Aug. 4, 2016, 4:29 p.m.) Review request for hive. Changes ------- Addressed minor comments. Removed the code that was duplicating the rename() to S3. Instead, it gets HDFS scratch directories for the required temporary files. Bugs: HIVE-14270 https://issues.apache.org/jira/browse/HIVE-14270 Repository: hive-git Description ------- This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used. Diffs (updated) ----- common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION Diff: https://reviews.apache.org/r/50359/diff/ Testing ------- NO PATCH ** NON-PARTITIONED TABLE - create table dummy (id int); 3.651s - insert into table s3dummy values (1); 39.231s - insert overwrite table s3dummy values (1); 42.569s - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 30.136s EXTERNAL TABLE - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 9.297s - insert into table s3dummy_ext values (1); 45.855s WITH PATCH ** NON-PARTITIONED TABLE - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 3.945s - insert into table s3dummy values (1); 15.025s - insert overwrite table s3dummy values (1); 25.149s - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 19.158s - from dummy insert overwrite table s3dummy select *; 25.469s - from dummy insert into table s3dummy select *; 14.501s ** EXTERNAL TABLE - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 4.827s - insert into table s3dummy_ext values (1); 16.070s ** PARTITIONED TABLE - create table s3dummypart (id int) partitioned by (part int) location 's3a://spena-bucket/user/hive/warehouse/s3dummypart'; 3.176s - alter table s3dummypart add partition (part=1); 3.229s - alter table s3dummypart add partition (part=2); 3.124s - insert into table s3dummypart partition (part=1) values (1); 14.876s - insert overwrite table s3dummypart partition (part=1) values (1); 27.594s - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart; 22.298s - from dummypart insert overwrite table s3dummypart partition (part=1) select id; 29.001s - from dummypart insert into table s3dummypart partition (part=1) select id; 14.869s ** DYNAMIC PARTITIONS - insert into table s3dummypart partition (part) select id, 1 from dummypart; 15.185s - insert into table s3dummypart partition (part) select id, 1 from dummypart; 18.820s Thanks, Sergio Pena --===============8251973988135829773==--