Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 069ED200BD3 for ; Tue, 6 Dec 2016 22:36:01 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 053E9160B29; Tue, 6 Dec 2016 21:36:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4EF13160B17 for ; Tue, 6 Dec 2016 22:36:00 +0100 (CET) Received: (qmail 92368 invoked by uid 500); 6 Dec 2016 21:35:59 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 92344 invoked by uid 99); 6 Dec 2016 21:35:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Dec 2016 21:35:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6CC152C0086 for ; Tue, 6 Dec 2016 21:35:59 +0000 (UTC) Date: Tue, 6 Dec 2016 21:35:59 +0000 (UTC) From: =?utf-8?Q?Sergio_Pe=C3=B1a_=28JIRA=29?= To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-15367) CTAS with LOCATION should write temp data under location directory rather than database location MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 06 Dec 2016 21:36:01 -0000 [ https://issues.apache.org/jira/browse/HIVE-15367?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1572= 6782#comment-15726782 ]=20 Sergio Pe=C3=B1a commented on HIVE-15367: ------------------------------------ [~stakiar] The patch looks good. Could you add some tests to validate the L= OCATION scenarios? Or are there tests that already do that? Can you add hive-blobstore tests as well? > CTAS with LOCATION should write temp data under location directory rather= than database location > -------------------------------------------------------------------------= ----------------------- > > Key: HIVE-15367 > URL: https://issues.apache.org/jira/browse/HIVE-15367 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Attachments: HIVE-15367.1.patch > > > For regular CTAS queries, temp data from a SELECT query will be written t= o to a staging directory under the database location. The code to control t= his is in {{SemanticAnalyzer.java}} > {code} > // allocate a temporary output dir on the location of the ta= ble > String tableName =3D getUnescapedName((ASTNode) ast.getChil= d(0)); > String[] names =3D Utilities.getDbTableName(tableName); > Path location; > try { > Warehouse wh =3D new Warehouse(conf); > //Use destination table's db location. > String destTableDb =3D qb.getTableDesc() !=3D null? qb.ge= tTableDesc().getDatabaseName(): null; > if (destTableDb =3D=3D null) { > destTableDb =3D names[0]; > } > location =3D wh.getDatabasePath(db.getDatabase(destTableD= b)); > } catch (MetaException e) { > throw new SemanticException(e); > } > {code} > However, CTAS queries allow specifying a {{LOCATION}} for the new table. = Its possible for this location to be on a different filesystem than the dat= abase location. If this happens temp data will be written to the database f= ilesystem and will be copied to the table filesystem in {{MoveTask}}. > This extra copying of data can drastically affect performance. Rather tha= n always use the database location as the staging dir for CTAS queries, Hiv= e should first check if there is an explicit {{LOCATION}} specified in the = CTAS query. If there is, staging data should be stored under the {{LOCATION= }} directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)