hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Takiar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15367) CTAS with LOCATION should write temp data under location directory rather than database location
Date Tue, 06 Dec 2016 01:52:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15724008#comment-15724008
] 

Sahil Takiar commented on HIVE-15367:
-------------------------------------

[~spena], [~ychena] you were looking at this logic as part of HIVE-11427, any chance you could
comment on if my logic sounds correct?

> CTAS with LOCATION should write temp data under location directory rather than database
location
> ------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-15367
>                 URL: https://issues.apache.org/jira/browse/HIVE-15367
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>
> For regular CTAS queries, temp data from a SELECT query will be written to to a staging
directory under the database location. The code to control this is in {{SemanticAnalyzer.java}}
> {code}
>              // allocate a temporary output dir on the location of the table
>               String tableName = getUnescapedName((ASTNode) ast.getChild(0));
>               String[] names = Utilities.getDbTableName(tableName);
>               Path location;
>               try {
>                 Warehouse wh = new Warehouse(conf);
>                 //Use destination table's db location.
>                 String destTableDb = qb.getTableDesc() != null? qb.getTableDesc().getDatabaseName():
null;
>                 if (destTableDb == null) {
>                   destTableDb = names[0];
>                 }
>                 location = wh.getDatabasePath(db.getDatabase(destTableDb));
>               } catch (MetaException e) {
>                 throw new SemanticException(e);
>               }
> {code}
> However, CTAS queries allow specifying a {{LOCATION}} for the new table. Its possible
for this location to be on a different filesystem than the database location. If this happens
temp data will be written to the database filesystem and will be copied to the table filesystem
in {{MoveTask}}.
> This extra copying of data can drastically affect performance. Rather than always use
the database location as the staging dir for CTAS queries, Hive should first check if there
is an explicit {{LOCATION}} specified in the CTAS query. If there is, staging data should
be stored under the {{LOCATION}} directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message