hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabor Kaszab (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-19830) Inconsistent behavior when multiple partitions point to the same location
Date Fri, 15 Jun 2018 06:29:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-19830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513413#comment-16513413
] 

Gabor Kaszab commented on HIVE-19830:
-------------------------------------

When you drop a partition like this it would also be an option to show an error to the user
saying that the partition can't be dropped as that would also affect other partitions.

> Inconsistent behavior when multiple partitions point to the same location
> -------------------------------------------------------------------------
>
>                 Key: HIVE-19830
>                 URL: https://issues.apache.org/jira/browse/HIVE-19830
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.4.0
>            Reporter: Gabor Kaszab
>            Assignee: Adam Szita
>            Priority: Major
>
> // create a table with 2 partitions where both partitions share the same location and
inserting a single line to one of them.
> create table test (i int) partitioned by (j int) stored as parquet;
> alter table test add partition (j=1) location 'hdfs://localhost:20500/test-warehouse/test/j=1';
> alter table test add partition (j=2) location 'hdfs://localhost:20500/test-warehouse/test/j=1';
> insert into table test partition (j=1) values (1);
> // select * show this single line in both partitions as expected.
> select * from test;
> 1 1
> 1 2
> // however, sum() doesn't add up the line for all the partitions. This is +Issue #1+.
> select sum( i), sum(j) from test;
> 1 2
> // On the file system there is a common dir for the 2 partitions that is expected.
> hdfs dfs -ls hdfs://localhost:20500/test-warehouse/test/
> Found 1 items
> drwxr-xr-x - gaborkaszab supergroup 0 2018-06-08 10:54 hdfs://localhost:20500/test-warehouse/test/j=1
> // Let's drop one of the partitions now!
> alter table test drop partition (j=2);
> // running the same hdfs dfs -ls command shows that the j=1 directory is dropped. I think
this is a good behavior, we just have to document that this is the expected case.
> // select * from test; returns zero rows, this is still as expected.
> // Even though the dir is dropped j=1 partition is still visible with show partitions.
This is +Issue #2+.
> show partitions test;
> j=1
> After dropping the directory with Hive, when Impala reloads it's partitions it asks Hive
to tell what are the existing partitions. Apparently, Hive sends down a list with j=1 partition
included and then Impala takes it as an existing one and doesn't drop it from Catalog's cache.
Here Hive shouldn't send that partition down. This is +Issue #3+.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message