Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D12F917ED2 for ; Wed, 8 Oct 2014 03:25:35 +0000 (UTC) Received: (qmail 66180 invoked by uid 500); 8 Oct 2014 03:25:35 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 66111 invoked by uid 500); 8 Oct 2014 03:25:34 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 66100 invoked by uid 500); 8 Oct 2014 03:25:34 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 66097 invoked by uid 99); 8 Oct 2014 03:25:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Oct 2014 03:25:34 +0000 Date: Wed, 8 Oct 2014 03:25:34 +0000 (UTC) From: "Sushanth Sowmyan (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-8371) HCatStorer should fail by default when publishing to an existing partition MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163012#comment-14163012 ] Sushanth Sowmyan commented on HIVE-8371: ---------------------------------------- It is going to flip the hive behaviour in that it will disallow insert-into if there is already data - that was intentional, to be consistent between hive and hcatalog. The question is - do we want to allow appends to data? If so, hive and hcatalog should both allow it. If not, hive and hcatalog should both deny it. I do understand the concern that HCatStorer behaviour has changed after being out for a long time, but from that same perspective, this new behaviour of HCatStorer has also been out for a while now, for publicly released hive. This could still be preserved with yet another warehouse-level parameter for legacy behaviour that makes HCatStorer default to immutable, and hive default to mutable, but honestly, I think that's ugly and will cause more problems going forward for maintainability. > HCatStorer should fail by default when publishing to an existing partition > -------------------------------------------------------------------------- > > Key: HIVE-8371 > URL: https://issues.apache.org/jira/browse/HIVE-8371 > Project: Hive > Issue Type: Bug > Components: HCatalog > Affects Versions: 0.13.0, 0.14.0, 0.13.1 > Reporter: Thiruvel Thirumoolan > Assignee: Thiruvel Thirumoolan > Labels: hcatalog, partition > > In Hive-12 and before (on in previous HCatalog releases) HCatStorer would fail if the partition already exists (whether before launching the job or during commit depending on the partitioning). HIVE-6406 changed that behavior and by default does an append. This causes data quality issues since an rerun (or duplicate run) won't fail (when it used to) and will just append to the partition. > A preferable approach would be to leave HCatStorer behavior as is (fail during a duplicate publish) and support append through an option. Overwrite also can be implemented in a similar fashion. Eg: > store A into 'db.table' using org.apache.hive.hcatalog.pig.HCatStorer('partspec', '', ' -append'); -- This message was sent by Atlassian JIRA (v6.3.4#6332)