hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Wilfong (JIRA)" <>
Subject [jira] [Commented] (HIVE-3106) Add option to make multi inserts more atomic
Date Mon, 11 Jun 2012 20:45:43 GMT


Kevin Wilfong commented on HIVE-3106:

Per Carl's comments, explicitely stated the advantages/disadvantages, removed atomic from
the name of the configuration variable, as this is not really true, removed references to
"outputs" in description of config.

Also, fixed an issue, where if a file was taking a long time to produce, there would still
be a long time between when the tables/partitions are produced and when the locks on them
are released. Now, when the option is set, the DependencyCollection task depends on the dependencies
of the move tasks for files, but the move tasks for files do not depend on the DependencyCollection
task, as there are no locks on these files so there would not be any advantage.

Added a new test case for this additional functionality.
> Add option to make multi inserts more atomic
> --------------------------------------------
>                 Key: HIVE-3106
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-3106.1.patch.txt
> Currently, with multi-insert queries as soon the output of one of the inserts is ready
the move task associated with that insert is run, creating the table/partition.  However,
if concurrency is enabled the lock on this table/partition is not released until the entire
query finishes, which can be much later.
> This causes issues if, for example, a user is waiting for an output of the multi-insert
query which is created long before the other outputs, and checking for it's existence using
the metastore's Thrift methods (get_table/get_partition).  In which case, the user will run
their query which uses the output, and it will experience a timeout trying to acquire the
lock on the table/partition.
> If all the move tasks depend on the parent's of all other move tasks, the output creation
will be much closer to atomic relieving this problem.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message