hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <>
Subject [jira] [Commented] (HIVE-13622) WriteSet tracking optimizations
Date Mon, 16 May 2016 19:09:12 GMT


Eugene Koifman commented on HIVE-13622:

[~teabot] could you look at this patch please?  I made changes to Lock and TestLock but they
are not perfect.
I now need the client to set additional properties on LockComponent.  One to indicate the
operation type (CRUD) and another to indicate if the resource is acid or not.  I made changes
in Lock conservatively to make sure old behavior is preserved but ideally it should differentiate
all 4 operations.  It wasn't obvious to me how to get that information.  

> WriteSet tracking optimizations
> -------------------------------
>                 Key: HIVE-13622
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 1.3.0, 2.1.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>            Priority: Critical
>         Attachments: HIVE-13622.2.patch, HIVE-13622.3.patch, HIVE-13622.4.patch
> HIVE-13395 solves the the lost update problem with some inefficiencies.
> 1. TxhHandler.OperationType is currently derived from LockType.  This doesn't  distinguish
between Update and Delete but would be useful.  See comments in TxnHandler.  Should be able
to pass in Insert/Update/Delete info from client into TxnHandler.
> 2. TxnHandler.addDynamicPartitions() should know the OperationType as well from the client.
 It currently extrapolates it from TXN_COMPONENTS.  This works but requires extra SQL statements
and is thus less performant.  It will not work multi-stmt txns.  See comments in the code.
> 3. TxnHandler.checkLock() see more comments around "isPartOfDynamicPartitionInsert".
 If TxnHandler knew whether it is being called as part of an op running with dynamic partitions,
it could be more efficient.  In that case we don't have to write to TXN_COMPONENTS at all
during lock acquisition.  Conversely, if not running with DynPart then, we can kill current
txn on lock grant rather than wait until commit time.
> 4. TxnHandler.addDynamicPartitions() - the insert stmt here should combing multiple rows
into single SQL stmt (but with a limit for extreme cases)
> 5. TxnHandler.enqueueLockWithRetry() - this currently adds components that are only being
read to TXN_COMPONENTS.   This is useless at best since read op don't generate anything to
compact.  For example, delete from T where t1 in (select c1 from C) - no reason to add C to
txn_components but we do.
> All of these require some Thrift changes
> Once done, re-enable TestDbTxnHandler2.testWriteSetTracking11()
> Also see comments in [here|]

This message was sent by Atlassian JIRA

View raw message