hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Namit Jain <>
Subject Re: Hive Concurrency Model - does it work?
Date Wed, 26 Jan 2011 19:04:34 GMT
The patch below has been committed. was a follow-up patch which should help concurrency.
I have not tried backporting the patch on hive 0.5 or hive0.6, but I don’t think it will
work, since the code
has changed significantly, and a number of bug fixes to update the inputs and outputs went

By default, concurrency is disabled. If you want to enable it, you need to set:
to true


From: Jay Ramadorai <<>>
Reply-To: <<>>
Date: Wed, 26 Jan 2011 13:52:58 -0500
To: <<>>
Subject: Hive Concurrency Model - does it work? : Is this JIRA truly fixed and included in
If so, can the patch be applied separately on top of 0.5.0 or 0.6.0?
Are there instructions somewhere for how to enable/integrate Zookeeper with Hive for this
patch to work?
The JIRA comments indicate the patch was tested and committed, however the wiki that the JIRA
points to implies concurrency will not be supported.
Hence the confusion.
Is there a simple way in Hive to query which tables are currently being accessed?

More detail:
What I'm trying to do is to do daily Sqoop-imports into Hive from an external database. There
are jobs running on the Hive warehouse a lot of the times. I import the data into temporary
tables in Hive and then want to drop the permanent tables, and rename the (just-imported)
temporary ones to the permanent names WITHOUT IMPACTING THE JOBS.  At the moment of course
doing an ALTER TABLE RENAME results in any running jobs accessing the table to die on the
next fetch. So I thought if the above JIRA was indeed fixed, then 0.7.0 should allow the job
to complete before the Rename gets its X lock, or if the rename is in progress, the Job wont
get its S lock until the Rename is done. However our test on 0.7.0 trunk (pulled in late September)
reveals that the rename happens instantly even with a query accessing the table, not waiting
for any locks.

Barring this patch, are there any other ideas anyone can suggest for accomplishing what I
want? Some ideas we have considered:
- Parse Hive logs/xml files looking for a tablename to determine if there is a job currently
accessing the table. If not, then rename.
- Create views on temporary tables named by day. Have jobs go against the views. When we are
ready to rename, basically replace the view, pointing it now to the new table of today. The
key question here is: is the View metadata consulted only upon query startup, or is it repeatedly
looked at during query execution. If only on startup, we might be able to get away this trick,
until concurrency truly works.


View raw message