Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2707D200BC5 for ; Tue, 22 Nov 2016 23:50:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 24138160B0C; Tue, 22 Nov 2016 22:50:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7451C160B1C for ; Tue, 22 Nov 2016 23:49:59 +0100 (CET) Received: (qmail 11718 invoked by uid 500); 22 Nov 2016 22:49:58 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 11699 invoked by uid 99); 22 Nov 2016 22:49:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Nov 2016 22:49:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 637672C4C73 for ; Tue, 22 Nov 2016 22:49:58 +0000 (UTC) Date: Tue, 22 Nov 2016 22:49:58 +0000 (UTC) From: "Sahil Takiar (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 22 Nov 2016 22:50:00 -0000 [ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688185#comment-15688185 ] Sahil Takiar commented on HIVE-15121: ------------------------------------- [~spena] test failures look unrelated, and the tests are failing on other patches too. > Last MR job in Hive should be able to write to a different scratch directory > ---------------------------------------------------------------------------- > > Key: HIVE-15121 > URL: https://issues.apache.org/jira/browse/HIVE-15121 > Project: Hive > Issue Type: Sub-task > Components: Hive > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, HIVE-15121.patch > > > Hive should be able to configure all intermediate MR jobs to write to HDFS, but the final MR job to write to S3. > This will be useful for implementing parallel renames on S3. The idea is that for a multi-job query, all intermediate MR jobs write to HDFS, and then the final job writes to S3. Writing to HDFS should be faster than writing to S3, so it makes more sense to write intermediate data to HDFS. > The advantage is that any copying of data that needs to be done from the scratch directory to the final table directory can be done server-side, within the blobstore. The MoveTask simply renames data from the scratch directory to the final table location, which should translate to a server-side COPY request. This way HiveServer2 doesn't have to actually copy any data, it just tells the blobstore to do all the work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)