hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Capriolo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
Date Mon, 18 Nov 2013 18:51:30 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825611#comment-13825611
] 

Edward Capriolo commented on HIVE-5317:
---------------------------------------

I have two fundamental problems with this concept.
{quote}
The only requirement is that the file format must be able to support a rowid. With things
like text and sequence file this can be done via a byte offset.
{quote}

This is a good reason not to do this. Things that  only work for some formats create fragmentation.
What about format's that do not have a row id? What if the user is already using the key for
something else like data?

{quote}
Once an hour a log of transactions is exported from a RDBS and the fact tables need to be
updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts,
updates, and deletes. The table is partitioned and bucketed.
{quote}

What this ticket describes seems like a bad use case for hive. Why would the user not simply
create a new table partitioned by hour? What is the need to transaction ally in-place update
a table? 

It seems like the better solution would be for the user to log these updates themselves and
then export the table with a tool like squoop periodically.  

I see this as a really complicated piece of work, for a narrow use case, and I have a very
difficult time believing adding transactions to hive to support this is the right answer.

> Implement insert, update, and delete in Hive with full ACID support
> -------------------------------------------------------------------
>
>                 Key: HIVE-5317
>                 URL: https://issues.apache.org/jira/browse/HIVE-5317
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: InsertUpdatesinHive.pdf
>
>
> Many customers want to be able to insert, update and delete rows from Hive tables with
full ACID support. The use cases are varied, but the form of the queries that should be supported
are:
> * INSERT INTO tbl SELECT …
> * INSERT INTO tbl VALUES ...
> * UPDATE tbl SET … WHERE …
> * DELETE FROM tbl WHERE …
> * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ...
> * SET TRANSACTION LEVEL …
> * BEGIN/END TRANSACTION
> Use Cases
> * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension
tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have
primary keys and are typically bucketed and sorted on those keys.
> * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory
compliance.
> * Once an hour a log of transactions is exported from a RDBS and the fact tables need
to be updated (up to 1m rows)  to reflect the new data. The transactions are a combination
of inserts, updates, and deletes. The table is partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message