hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wangzhihao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-18927) Hive "insert overwrite" doesn't replace the destination files if no partition in metastore for the files
Date Sat, 10 Mar 2018 06:08:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-18927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

wangzhihao updated HIVE-18927:
------------------------------
    Description: 
[This post|http://www.ericlin.me/2015/05/hive-insert-overwrite-does-not-remove-existing-data/]
describes a way to reproduce this issue:
{noformat}
# Add some files into file system but no partition in metastore to track it.
hdfs dfs -put test.txt test/p=p1

# Insert overwrite the partition(p = p1)
DROP TABLE IF EXISTS partition_test;
CREATE EXTERNAL TABLE partition_test (a int) PARTITIONED BY (p string);
INSERT OVERWRITE TABLE partition_test PARTITION (p = 'p1') SELECT 123;

# verify the text.txt is not removed.
hdfs dfs -ls test/p=p1
Found 2 items
-rwxr-xr-x   3 hdfs supergroup     194965 2015-05-05 00:15 test/p=p1/000000_0
-rw-r--r--   3 hdfs supergroup          8 2015-05-05 00:10 test/p=p1/test.txt
{noformat}
The reason is that [Hive.loadPartition|https://github.com/apache/hive/blob/9b36ffa92cc4e0f47ea03d8d167debe743342f5b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1652]
will try to {{replaceFiles}} only if {{oldPath}} exists. Since metastore has no partition
for the files, the {{oldPath}} is null and thus the files get no chance to be cleaned. We
should also clean {{destf}} in method [Hive.replaceFiles|https://github.com/apache/hive/blob/b362de3871764731d8371657b07140e37a3c5105/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3817]
to fix the issue.

  was:
[This post|http://www.ericlin.me/2015/05/hive-insert-overwrite-does-not-remove-existing-data/]
describes a way to produce this issue:
{noformat}
# Add some files into file system but no partition in metastore to track it.
hdfs dfs -put test.txt test/p=p1

# Insert overwrite the partition(p = p1)
DROP TABLE IF EXISTS partition_test;
CREATE EXTERNAL TABLE partition_test (a int) PARTITIONED BY (p string);
INSERT OVERWRITE TABLE partition_test PARTITION (p = 'p1') SELECT 123;

# verify the text.txt is not removed.
hdfs dfs -ls test/p=p1
Found 2 items
-rwxr-xr-x   3 hdfs supergroup     194965 2015-05-05 00:15 test/p=p1/000000_0
-rw-r--r--   3 hdfs supergroup          8 2015-05-05 00:10 test/p=p1/test.txt
{noformat}
The reason is that [Hive.loadPartition|https://github.com/apache/hive/blob/9b36ffa92cc4e0f47ea03d8d167debe743342f5b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1652]
will try to {{replaceFiles}} only if {{oldPath}} exists. Since metastore has no partition
for the files, the {{oldPath}} is null and thus the files get no chance to be cleaned. We
should also clean {{destf}} in method [Hive.replaceFiles|https://github.com/apache/hive/blob/b362de3871764731d8371657b07140e37a3c5105/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3817]
to fix the issue.


> Hive "insert overwrite" doesn't replace the destination files if no partition in metastore
for the files
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18927
>                 URL: https://issues.apache.org/jira/browse/HIVE-18927
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: wangzhihao
>            Priority: Major
>
> [This post|http://www.ericlin.me/2015/05/hive-insert-overwrite-does-not-remove-existing-data/]
describes a way to reproduce this issue:
> {noformat}
> # Add some files into file system but no partition in metastore to track it.
> hdfs dfs -put test.txt test/p=p1
> # Insert overwrite the partition(p = p1)
> DROP TABLE IF EXISTS partition_test;
> CREATE EXTERNAL TABLE partition_test (a int) PARTITIONED BY (p string);
> INSERT OVERWRITE TABLE partition_test PARTITION (p = 'p1') SELECT 123;
> # verify the text.txt is not removed.
> hdfs dfs -ls test/p=p1
> Found 2 items
> -rwxr-xr-x   3 hdfs supergroup     194965 2015-05-05 00:15 test/p=p1/000000_0
> -rw-r--r--   3 hdfs supergroup          8 2015-05-05 00:10 test/p=p1/test.txt
> {noformat}
> The reason is that [Hive.loadPartition|https://github.com/apache/hive/blob/9b36ffa92cc4e0f47ea03d8d167debe743342f5b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1652]
will try to {{replaceFiles}} only if {{oldPath}} exists. Since metastore has no partition
for the files, the {{oldPath}} is null and thus the files get no chance to be cleaned. We
should also clean {{destf}} in method [Hive.replaceFiles|https://github.com/apache/hive/blob/b362de3871764731d8371657b07140e37a3c5105/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3817]
to fix the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message