hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugene Koifman <ekoif...@hortonworks.com>
Subject Re: Hive locking mechanism on read partition.
Date Fri, 13 Oct 2017 17:21:36 GMT
I don’t think there is any way for you to get rid of the table level shared lock (though
it may be a reasonable improvement to make).
You could use https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.strict.locking.mode
To change X lock on write to S lock to get around this but this may not be appropriate for
the rest of your logic.

Eugene

From: Igor Kuzmenko <f1sherox@gmail.com>
Reply-To: "user@hive.apache.org" <user@hive.apache.org>
Date: Friday, October 13, 2017 at 2:16 AM
To: "user@hive.apache.org" <user@hive.apache.org>
Subject: Re: Hive locking mechanism on read partition.

Hi, Eugene.

Tables are not transactional and locks are backed by DbTxnManager.

On Fri, Oct 13, 2017 at 2:30 AM, Eugene Koifman <ekoifman@hortonworks.com<mailto:ekoifman@hortonworks.com>>
wrote:
Which lock manager are you using?
Do you have acid enabled and if so are these tables transactional?

Eugene


From: Igor Kuzmenko <f1sherox@gmail.com<mailto:f1sherox@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Thursday, October 12, 2017 at 3:58 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Hive locking mechanism on read partition.

Hello, I'm using HDP 2.5.0.0  with included hive 1.2.1. And I have problem with locking mechanism.

Most of my queries to hive looks like this.

(1)    insert into table results_table partition(task_id=${task_id})
        select * from data_table  where ....;

results_table partitioned by task_id field and i expect to get exclusive lock on corresponding
partition.Which is true:

Lock ID Database
        Table Partition
  State    Blocked By
Type           Transaction ID
136639682.4 default
results_table         task_id=5556
  ACQUIRED
                 EXCLUSIVE           NULL



Another type of query is fetching data from results_table:

(2)  select * from results_table where task_id = ${task_id}

This select doesn't require any map reduce and executes fast. This is exactly what I want.
But if I execute this two queries at the same time I can't perform read from result_table
partition while inserting data into another.

Locks looks like this:

Lock ID Database
        Table Partition
  State         Blocked By
Type         Transaction ID
136639682.4 default
results_table         task_id=5556
  ACQUIRED
                        EXCLUSIVE         NULL
136639700.1 default
results_table         NULL
  WAITING 136639682.4
SHARED_READ NULL
136639700.2 default
results_table         task_id=5535
   WAITING
                SHARED_READ NULL


Reading data from specified partition requires shared lock on whole table. This prevents me
to get data untill first query completes.

As I can see on this page <https://cwiki.apache.org/confluence/display/Hive/Locking#Locking-UseCases>
 this is expected behaivor. But I don't understand why we need lock on table.
Can I get rid of shared lock on whole table, while still having shared lock on specific partition?




Mime
View raw message