hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ch Wan <xmu.wc.2...@gmail.com>
Subject Re: doubt about locking mechanism in Hive
Date Thu, 18 Sep 2014 03:54:36 GMT
Hi,
We encounter this in hive 0.13.1 when CREATE TEMPORARY FUNCTION while a
SELECT is processing at the same database.

I think it's not necessary to require a EXCLUSIVE lock for such DDL
statements.

I found this patch <https://issues.apache.org/jira/browse/HIVE-6734>. It
seems like it only check the writetype in DbTxnManager. Maybe it's a good
idea to check the DLL's writetype in DummyTxnManager too.

2014-09-09 22:48 GMT+08:00 Edward Capriolo <edlinuxguru@gmail.com>:

> We use our own library, simple constructions like files in hdfs that work
> like pid/lock files. a file like /flags/tablea/process1 could mean "hey i'm
> working on table a leave it alone".  Accomplishes the exact same thing with
> less fuss, it is also much easier for an external process/scheduler/shell
> script to integrate with this system. I doubt many use hive locking as flow
> control for a scheduling system.
>
> On Tue, Sep 9, 2014 at 3:25 AM, wzc <wzc1989@gmail.com> wrote:
>
>> Hi,
>> We also encounter this in hive 0.13 , we need to enable concurrency  in
>> daily ETL workflows (to avoid sub etl start to read parent etl 's output
>> while it's still running).
>> We found that in hive 0.13 sometime when you open hive cli shell it would
>> output the msg "conflicting lock present for default mode EXCLUSIVE" and
>> wait for some locks to be released. We haven't  encounter this in hive 0.11
>> and are still trying to figure it out.
>>
>>
>>
>> 2014-08-25 15:21 GMT+08:00 Sourygna Luangsay <sluangsay@pragsis.com>:
>>
>>>  Many thanks Edward for this complete answer.
>>>
>>>
>>>
>>> So the main idea is to simply disable concurrency in Hive if I get you.
>>>
>>>
>>>
>>> My doubt now is: is it something most Hive users do as default?
>>>
>>> Can somebody else share its own experience?
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> *Sourygna Luangsay*
>>>
>>>
>>>
>>> *From:* Edward Capriolo [mailto:edlinuxguru@gmail.com]
>>> *Sent:* viernes, 22 de agosto de 2014 16:07
>>> *To:* user@hive.apache.org
>>> *Subject:* Re: doubt about locking mechanism in Hive
>>>
>>>
>>>
>>> IMHO locking support should be turned off by default. I would argue if
>>> you are requiring this feature often you may be designing your systems
>>> improperly.
>>>
>>> You really should not have that many situations where you need locking
>>> in a write (mostly) once file system. The only time I have ever used it is
>>> if I had a process completely re-writing the contents of a table and I
>>> needed downstream things not to select from this table when it was in an
>>> inconsistent state. Having it on by default is a bad idea. You have pointed
>>> out a case where doing a simple select query attempts to acquire locks it
>>> does not need. That puts strain on more systems and creates more changes
>>> for issues.
>>>
>>>
>>>
>>> One of the big design philosophy issues I tend to have with hive lately
>>> is we have this pool of users (like myself) that use hive for its original
>>> purpose. To query write once text files, and create aggregations.
>>>
>>> Then there are other groups attempting to implement very complicated
>>> semantics around streaming, transactions, locking, whatever. Then you have
>>> tools like cloudera manager giving configution warnings such as:
>>>
>>> " Hive: Hive is not configured with ZooKeeper Service. As a result,
>>> hive-site will not contain hive.zookeeper.quorum, which can lead to
>>> corruption in concurrency scenarios."
>>>
>>> I think this statement is incorrect AND is BAD advice.  Then users such
>>> as yourself making a conclusion like "I should turn on locking" because no
>>> one would ever assume that ....
>>>
>>> !!!SELECTING 1 ROW FROM A TABLE WOULD CAUSE 1100 LOCKS TO BE
>>> ACQUIRED!!!!
>>>
>>> ::rant over:: I am not saying that hive locking is bad, but I am saying
>>> I leave it off and turn it on when I need it on a per query basis.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Aug 22, 2014 at 8:48 AM, Sourygna Luangsay <
>>> sluangsay@pragsis.com> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> I have some troubles with the locking/concurrency mechanism of Hive when
>>> doing a large select and trying to create a table at the same time.
>>>
>>> My version of Hive is 0.13.
>>>
>>>
>>>
>>> What I try to do is the following:
>>>
>>>
>>>
>>> 1)      In a hive shell:
>>> use mydatabase;
>>> select * from competence limit 1;     # this table has 1100 partitions.
>>> So with hive.support.concurrency=true, it needs at least 90s to execute (I
>>> know, this is a silly query: I should rather do a select * where “a
>>> partition”… The purpose of this query is to replicate easily the problem
by
>>> having a query that needs a lot of time to execute)
>>>
>>>
>>>
>>> 2)      In another hive shell, meanwhile the 1st query is executing:
>>> use mydatabase;
>>> create table probsourygna (foo string) ROW FORMAT DELIMITED FIELDS
>>> TERMINATED BY '\t'  STORED AS TEXTFILE ;
>>>
>>> The problem is that the “create table” does not execute untill the first
>>> query (select) has finished.
>>>
>>> And we can see messages of the following type:
>>>
>>> conflicting lock present for mydatabase mode EXCLUSIVE
>>>
>>> conflicting lock present for mydatabase mode EXCLUSIVE
>>>
>>> …
>>>
>>>
>>>
>>> (1 line every 60 s)
>>>
>>>
>>>
>>>
>>>
>>> It seems to me that the first query puts a shared lock at the database
>>> (mydatabase) level.
>>>
>>> Then, the second query tries to acquire an exclusive lock at the
>>> database level (fails and retries every 60s).
>>>
>>>
>>>
>>> Am I right? (when I look at the documentation
>>> https://cwiki.apache.org/confluence/display/Hive/Locking , it says
>>> nothing about locks at a database level)
>>>
>>> Is there any solution to my problem? (avoiding a long “select” to block
>>> a “create” query, without removing the concurrency of Hive)
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> *Sourygna Luangsay*
>>>
>>>
>>> AVISO CONFIDENCIAL
>>> Este correo y la información contenida o adjunta al mismo es privada y
>>> confidencial y va dirigida exclusivamente a su destinatario. Pragsis
>>> informa a quien pueda haber recibido este correo por error que contiene
>>> información confidencial cuyo uso, copia, reproducción o distribución está
>>> expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe
>>> este correo por error, le rogamos lo ponga en conocimiento del emisor y
>>> proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún
>>> modo.
>>> CONFIDENTIALITY WARNING.
>>> This message and the information contained in or attached to it are
>>> private and confidential and intended exclusively for the addressee.
>>> Pragsis informs to whom it may receive it in error that it contains
>>> privileged information and its use, copy, reproduction or distribution is
>>> prohibited. If you are not an intended recipient of this E-mail, please
>>> notify the sender, delete it and do not read, act upon, print, disclose,
>>> copy, retain or redistribute any portion of this E-mail.
>>>
>>>
>>>
>>> AVISO CONFIDENCIAL
>>> Este correo y la información contenida o adjunta al mismo es privada y
>>> confidencial y va dirigida exclusivamente a su destinatario. Pragsis
>>> informa a quien pueda haber recibido este correo por error que contiene
>>> información confidencial cuyo uso, copia, reproducción o distribución está
>>> expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe
>>> este correo por error, le rogamos lo ponga en conocimiento del emisor y
>>> proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún
>>> modo.
>>> CONFIDENTIALITY WARNING.
>>> This message and the information contained in or attached to it are
>>> private and confidential and intended exclusively for the addressee.
>>> Pragsis informs to whom it may receive it in error that it contains
>>> privileged information and its use, copy, reproduction or distribution is
>>> prohibited. If you are not an intended recipient of this E-mail, please
>>> notify the sender, delete it and do not read, act upon, print, disclose,
>>> copy, retain or redistribute any portion of this E-mail.
>>>
>>
>>
>

Mime
View raw message