hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-15032) Update/Delete statements use dynamic partitions when it's not necessary
Date Sat, 22 Oct 2016 00:12:58 GMT

     [ https://issues.apache.org/jira/browse/HIVE-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eugene Koifman updated HIVE-15032:
----------------------------------
    Description: 
{noformat}
create table if not exists TAB_PART (a int, b int)  partitioned by (p string) clustered by
(a) into 2  buckets stored as orc TBLPROPERTIES ('transactional'='true')

   insert into TAB_PART partition(p='blah') values(1,2) //this uses static part
    update TAB_PART set b = 7 where p = 'blah' //this uses DP... WHY?
{noformat}

the Update is rewritten into an Insert stmt but SemanticAnalzyer.genFileSink() for this Insert
is set up with dynamic partitions

at least in theory, we should be able to analyze the WHERE clause so that Insert doesn't have
to use DP.

Another important side effect of this is how locks are acquired.  If the table doesn't have
partition 'blah', ss it is, a SHARED_WRITE is acquired on the TAB_PART table.
However it would suffice to acquire a SHARED_WRITE on the single partition operated on, or
better yet, short circuit the query.

If the table does have partition 'blah', we get only the partition lock

see TestDbTxnManager2.testWriteSetTracking3() testWriteSetTracking5()

  was:
{noformat}
create table if not exists TAB_PART (a int, b int)  partitioned by (p string) clustered by
(a) into 2  buckets stored as orc TBLPROPERTIES ('transactional'='true')

   insert into TAB_PART partition(p='blah') values(1,2) //this uses static part
    update TAB_PART set b = 7 where p = 'blah' //this uses DP... WHY?
{noformat}

the Update is rewritten into an Insert stmt but SemanticAnalzyer.genFileSink() for this Insert
is set up with dynamic partitions

at least in theory, we should be able to analyze the WHERE clause so that Insert doesn't have
to use DP.

Another important side effect of this is how locks are acquired.  If the table doesn't have
partition 'blah', ss it is, a SHARED_WRITE is acquired on the TAB_PART table.
However it would suffice to acquire a SHARED_WRITE on the single partition operated on, or
better yet, short circuit the query.

If the table does have partition 'blah', we get only the partition lock


> Update/Delete statements use dynamic partitions when it's not necessary
> -----------------------------------------------------------------------
>
>                 Key: HIVE-15032
>                 URL: https://issues.apache.org/jira/browse/HIVE-15032
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 1.0.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> {noformat}
> create table if not exists TAB_PART (a int, b int)  partitioned by (p string) clustered
by (a) into 2  buckets stored as orc TBLPROPERTIES ('transactional'='true')
>    insert into TAB_PART partition(p='blah') values(1,2) //this uses static part
>     update TAB_PART set b = 7 where p = 'blah' //this uses DP... WHY?
> {noformat}
> the Update is rewritten into an Insert stmt but SemanticAnalzyer.genFileSink() for this
Insert is set up with dynamic partitions
> at least in theory, we should be able to analyze the WHERE clause so that Insert doesn't
have to use DP.
> Another important side effect of this is how locks are acquired.  If the table doesn't
have partition 'blah', ss it is, a SHARED_WRITE is acquired on the TAB_PART table.
> However it would suffice to acquire a SHARED_WRITE on the single partition operated on,
or better yet, short circuit the query.
> If the table does have partition 'blah', we get only the partition lock
> see TestDbTxnManager2.testWriteSetTracking3() testWriteSetTracking5()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message