hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vihang Karajgaonkar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15880) Allow insert overwrite and truncate table query to use auto.purge table property
Date Wed, 19 Jul 2017 15:57:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093316#comment-16093316
] 

Vihang Karajgaonkar commented on HIVE-15880:
--------------------------------------------

Hi [~leftylev] Isnt the documentation provided above sufficient? I saw my name on you email
on the dev-list regarding 2.3.0 release. May be I had to remove the TODOC label?

> Allow insert overwrite and truncate table query to use auto.purge table property
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-15880
>                 URL: https://issues.apache.org/jira/browse/HIVE-15880
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>              Labels: TODOC2.3
>             Fix For: 2.3.0, 3.0.0
>
>         Attachments: HIVE-15880.01.patch, HIVE-15880.02.patch, HIVE-15880.03.patch, HIVE-15880.04.patch,
HIVE-15880.05.patch, HIVE-15880.06.patch
>
>
> It seems inconsistent that auto.purge property is not considered when we do a INSERT
OVERWRITE while it is when we do a DROP TABLE
> Drop table doesn't move table data to Trash when auto.purge is set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> No rows affected (0.064 seconds)
> > alter table temp set tblproperties('auto.purge'='true');
> No rows affected (0.083 seconds)
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> No rows affected (25.473 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive         22 2017-02-09 13:03 /user/hive/warehouse/temp/000000_0
> #
> > drop table temp;
> No rows affected (0.242 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> ls: `/user/hive/warehouse/temp': No such file or directory
> #
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> #
> {noformat}
> INSERT OVERWRITE query moves the table data to Trash even when auto.purge is set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> > alter table temp set tblproperties('auto.purge'='true');
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive         22 2017-02-09 13:07 /user/hive/warehouse/temp/000000_0
> #
> > insert overwrite table temp select * from dummy;
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive         26 2017-02-09 13:08 /user/hive/warehouse/temp/000000_0
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> Found 1 items
> drwx------   - hive hive          0 2017-02-09 13:08 /user/hive/.Trash/Current/user/hive/warehouse/temp
> #
> {noformat}
> While move operations are not very costly on HDFS it could be significant overhead on
slow FileSystems like S3. This could improve the performance of {{INSERT OVERWRITE TABLE}}
queries especially when there are large number of partitions on tables located on S3 should
the user wish to set auto.purge property to true
> Similarly {{TRUNCATE TABLE}} query on a table with {{auto.purge}} property set true should
not move the data to Trash



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message