hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thiruvel Thirumoolan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-8371) HCatStorer should fail by default when publishing to an existing partition
Date Mon, 06 Oct 2014 23:49:34 GMT
Thiruvel Thirumoolan created HIVE-8371:
------------------------------------------

             Summary: HCatStorer should fail by default when publishing to an existing partition
                 Key: HIVE-8371
                 URL: https://issues.apache.org/jira/browse/HIVE-8371
             Project: Hive
          Issue Type: Bug
          Components: HCatalog
    Affects Versions: 0.13.1, 0.13.0, 0.14.0
            Reporter: Thiruvel Thirumoolan


In Hive-12 and before (on in previous HCatalog releases) HCatStorer would fail if the partition
already exists (whether before launching the job or during commit depending on the partitioning).
HIVE-6406 changed that behavior and by default does an append. This causes data quality issues
since an rerun (or duplicate run) won't fail (when it used to) and will just append to the
partition.

A preferable approach would be to leave HCatStorer behavior as is (fail during a duplicate
publish) and support append through an option. Overwrite also can be implemented in a similar
fashion. Eg:

store A into 'db.table' using org.apache.hive.hcatalog.pig.HCatStorer('partspec', '', ' -append');



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message