hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elliot West (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-12285) Add locking to HCatClient
Date Wed, 28 Oct 2015 12:10:27 GMT
Elliot West created HIVE-12285:
----------------------------------

             Summary: Add locking to HCatClient
                 Key: HIVE-12285
                 URL: https://issues.apache.org/jira/browse/HIVE-12285
             Project: Hive
          Issue Type: Improvement
          Components: HCatalog
    Affects Versions: 2.0.0
            Reporter: Elliot West
            Assignee: Elliot West


With the introduction of a concurrency model (HIVE-1293) Hive uses locks to coordinate  access
and updates to both table data and metadata. Within the Hive CLI such lock management is seamless.
However, Hive provides additional APIs that permit interaction with data repositories, namely
the HCatalog APIs. Currently, operations implemented by this API do not participate with Hive's
locking scheme. Furthermore, access to the locking mechanisms is not exposed by the APIs (as
is the case with the Metastore Thrift API) and so users are not able to explicitly interact
with locks either. This has created a less than ideal situation where users of the APIs have
no choice but to manipulate these data repositories outside of the command of Hive's lock
management, potentially resulting in situations where data inconsistencies can occur both
for external processes using the API and for queries executing within Hive.

h3. Scope of work
This ticket is concerned with sections of the HCatalog API that deal with DDL type operations
using the metastore, not with those whose purpose is to read/write table data. A separate
issue already exists for adding locking to HCat readers and writers (HIVE-6207).

h3. Proposed work
The following work items would serve as a minimum deliverable that would both allow API users
to effectively work with locks:
* Comprehensively document on the wiki the locks required for various Hive operations. At
a minimum this should cover all operations exposed by {{HCatClient}}. The [Locking design
document|https://cwiki.apache.org/confluence/display/Hive/Locking] can be used as a starting
point or perhaps updated.
* Implement methods and types in the {{HCatClient}} API that allow users to manipulate Hive
locks. For the most part I'd expect these to delegate to the metastore API implementations:
** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)}}
** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.checkLock(long)}}
** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)}}
** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.showLocks()}}
** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.heartbeat(long, long)}}
** {{org.apache.hadoop.hive.metastore.api.LockComponent}}
** {{org.apache.hadoop.hive.metastore.api.LockRequest}}
** {{org.apache.hadoop.hive.metastore.api.LockResponse}}
** {{org.apache.hadoop.hive.metastore.api.LockLevel}}
** {{org.apache.hadoop.hive.metastore.api.LockType}}
** {{org.apache.hadoop.hive.metastore.api.LockState}}
** {{org.apache.hadoop.hive.metastore.api.ShowLocksResponse}}

h3. Additional proposals
Explicit lock management should be fairly simple to add to {{HCatClient}}, however it puts
the onus on the API user to correctly understand and implement code that uses lock in an appropriate
manner. Failure to do so may have undesirable consequences. With a simpler user model the
operations exposed on the API would automatically acquire and release the locks that they
need. This might work well for small numbers of operations, but not perhaps for large sequences
of invocations. (Do we need to worry about this though as the API methods usually accept batches?).
 Additionally tasks such as heartbeat management could also be handled implicitly for long
running sets of operations. With these concerns in mind it may also be beneficial to deliver
some of the following:
* A means to automatically acquire/release appropriate locks for {{HCatClient}} operations.
* A component that maintains a lock heartbeat from the client.
* A strategy for switching between manual/automatic lock management, analogous to SQL's {{autocommit}}
for transactions.

An API for lock and heartbeat management already exists in the HCatalog Mutation API (see:
{{org.apache.hive.hcatalog.streaming.mutate.client.lock}}). It will likely make sense to refactor
either this code and/or code that uses it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message