hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sushanth Sowmyan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8371) HCatStorer should fail by default when publishing to an existing partition
Date Wed, 08 Oct 2014 03:25:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163012#comment-14163012
] 

Sushanth Sowmyan commented on HIVE-8371:
----------------------------------------

It is going to flip the hive behaviour in that it will disallow insert-into if there is already
data - that was intentional, to be consistent between hive and hcatalog. The question is -
do we want to allow appends to data? If so, hive and hcatalog should both allow it. If not,
hive and hcatalog should both deny it.

I do understand the concern that HCatStorer behaviour has changed after being out for a long
time, but from that same perspective, this new behaviour of HCatStorer has also been out for
a while now, for publicly released hive.

This could still be preserved with yet another warehouse-level parameter for legacy behaviour
that makes HCatStorer default to immutable, and hive default to mutable, but honestly, I think
that's ugly and will cause more problems going forward for maintainability.

> HCatStorer should fail by default when publishing to an existing partition
> --------------------------------------------------------------------------
>
>                 Key: HIVE-8371
>                 URL: https://issues.apache.org/jira/browse/HIVE-8371
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.13.0, 0.14.0, 0.13.1
>            Reporter: Thiruvel Thirumoolan
>            Assignee: Thiruvel Thirumoolan
>              Labels: hcatalog, partition
>
> In Hive-12 and before (on in previous HCatalog releases) HCatStorer would fail if the
partition already exists (whether before launching the job or during commit depending on the
partitioning). HIVE-6406 changed that behavior and by default does an append. This causes
data quality issues since an rerun (or duplicate run) won't fail (when it used to) and will
just append to the partition.
> A preferable approach would be to leave HCatStorer behavior as is (fail during a duplicate
publish) and support append through an option. Overwrite also can be implemented in a similar
fashion. Eg:
> store A into 'db.table' using org.apache.hive.hcatalog.pig.HCatStorer('partspec', '',
' -append');



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message