hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <alanfga...@gmail.com>
Subject Re: Transactional table read lifecycle
Date Wed, 22 Apr 2015 20:06:33 GMT
Whether you obtain a read lock depends on the guarantees you want to 
make to your readers.  Obtaining the lock will do a couple of things 
your uses might want:
1) It will prevent DDL statements such as DROP TABLE from removing the 
data while they are reading it.
2) It will prevent the compactor from removing the versions of the delta 
files they are reading.

The other step you'll want is to heartbeat the lock.  To avoid dead 
clients holding locks forever the DbLockManager times them out after 300 
seconds (default, it's configurable).  To avoid this you'll need to call 
IMetaStoreClient.heartbeat on a regular basis.

Alan.

> Elliot West <mailto:teabot@gmail.com>
> April 17, 2015 at 8:05
> Hi, I'm working on a Cascading Tap that reads the data that backs a 
> transactional Hive table. I've successfully utilised the in-built 
> OrcInputFormat functionality to read and merge the deltas with the 
> base and optionally pull in the RecordIdentifiers. However, I'm now 
> considering what other steps I may need to take to collaborate with an 
> active Hive instance that could be writing to or compacting the table 
> as I'm trying to read it.
>
> I recently became aware of the need to obtain a list of valid 
> transaction IDs but now wonder if I must also acquire a read lock for 
> the table? I'm thinking that the set of interactions for reading this 
> data may look something like:
>
>  1. Obtain ValidTxnList from the meta store:
>     org.apache.hadoop.hive.metastore.IMetaStoreClient.getValidTxns()
>
>  2. Set the ValidTxnList in the Configuration:
>     conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.toString());
>
>  3. Aquire a read lock:
>     org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)
>
>  4. Use OrcInputFormat to read the data
>
>  5. Finally, release the lock:
>     org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)
>
> Can you advise on whether the lock is needed, whether this is the 
> correct way of managing the lock, and whether there are any other 
> steps I need take to appropriately interact with the data underpinning 
> a 'live' transactional table?
>
> Thanks - Elliot.
>

Mime
View raw message