hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furcy Pin <pin.fu...@gmail.com>
Subject Re: Hive locking mechanism on read partition.
Date Thu, 12 Oct 2017 13:23:25 GMT
According to the documentation link you gave:

"A 'S' lock on table and relevant partition is acquired when a read is
being performed. For all other operations, an 'X' lock is taken on the
partition. However, if the change is only applicable to the newer
partitions, a 'S' lock is acquired on the table, whereas if the change is
applicable to all partitions, a 'X' lock is acquired on the table..."

They also give an example:

*insert into T2(partition P2) select .. T1 partition P1*

*S on T2, T1, T1.P1 and X on T2.P2*


However, it seems that someone else wrote this bellow too:

*The recipe listed above will not work as specified, because of the
hierarchical nature of locks.*

The 'S' lock for table T is specified as follows:

   - Call create( ) to create a node with pathname "/warehouse/T/read-".
   This is the lock node used later in the protocol. Make sure to set the
   sequence and ephemeral flag.
   - *Call getChildren( ) on the lock node without setting the watch flag.*
   - *If there is a child with a pathname starting with "write-" and a
   lower sequence number than the one obtained, the lock cannot be acquired.
   Delete the node created in the first step and return.*
   - Otherwise the lock is granted.

The 'X' lock for table T is specified as follows:

   - Call create( ) to create a node with pathname "/warehouse/T/write-".
   This is the lock node used later in the protocol. Make sure to set the
   sequence and ephemeral flag.
   - Call getChildren( ) on the lock node without setting the watch flag.
   - If there is a child with a pathname starting with "read-" or "write-"
   and a lower sequence number than the one obtained, the lock cannot be
   acquired. Delete the node created in the first step and return.
   - Otherwise the lock is granted.



So, as I understand it, the S lock on results_table is blocked by the X
lock on partition task_id=5556,
which means that getChildren() on the results_table finds a child
(partition task_id=5556) with a X lock
and doesn't grant the lock.

I am not part of the Hive development team, but I would say that this looks
like a bug, or at least
a misconception to me. But I found many very old JIRA issues about locking
that are still open, so I don't know
if this feature is actively maintained.

It would be preferable to test this on a more recent version before
submitting a JIRA, though.


As an unsafe workaround, you can add this line before each of your read
queries:

SET hive.support.concurrency=false ;





2017-10-12 12:58 GMT+02:00 Igor Kuzmenko <f1sherox@gmail.com>:

> Hello, I'm using HDP 2.5.0.0  with included hive 1.2.1. And I have problem
> with locking mechanism.
>
> Most of my queries to hive looks like this.
>
> *(1)    insert into table results_table partition(task_id=${task_id})*
> *        select * from data_table  where ....;*
>
> results_table partitioned by task_id field and i expect to get exclusive
> lock on corresponding partition.Which is true:
>
> Lock ID Database         Table Partition   State    Blocked By Type
>     Transaction ID
> 136639682.4 default results_table         task_id=5556   ACQUIRED
>                      EXCLUSIVE           NULL
>
>
>
> Another type of query is fetching data from results_table:
>
> *(2)  select * from results_table where task_id = ${task_id}*
>
> This select doesn't require any map reduce and executes fast. This is
> exactly what I want.
> But if I execute this two queries at the same time I can't perform read
> from result_table partition while inserting data into another.
>
> Locks looks like this:
>
> Lock ID Database         Table Partition   State         Blocked By Type
>       Transaction ID
> 136639682.4 default results_table         task_id=5556   ACQUIRED
>                             EXCLUSIVE         NULL
> 136639700.1 default results_table         NULL   WAITING 136639682.4
> SHARED_READ NULL
> 136639700.2 default results_table         task_id=5535    WAITING
>                     SHARED_READ NULL
>
>
> Reading data from specified partition requires shared lock on whole table.
> This prevents me to get data untill first query completes.
>
> As I can see on this page
> <https://cwiki.apache.org/confluence/display/Hive/Locking#Locking-UseCases> this
> is expected behaivor. But I don't understand why we need lock on table.
> Can I get rid of shared lock on whole table, while still having shared
> lock on specific partition?
>
>
>
>

Mime
View raw message