incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <>
Subject Re: HCatalog HA deployment
Date Thu, 08 Sep 2011 01:03:05 GMT
Preliminary thoughts on implementing HA for HCatalog(Hive) metastore service:

* Multiple server instances will run behind a VIP
* Backing database will implement HA
* Metastore server instances will need to be able to share any state required for VIP outside

As of Hive 0.8 affected conversational state that needs to support VIP/HA setup is limited
to current delegation tokens. Is this correct?

We are considering ZooKeeper to share current delegation tokens between nodes of the VIP.
ZooKeeper is already (optionally) used by Hive for concurrency control. Access to ZooKeeper
would be limited on the network level or in the future, when ZooKeeper supports security,
through Kerberos, similar to NN access.

Currently Hive taps into Hadoop core security delegation token support through extension of<TokenIdent>

A solution could amend this Hive specific extension to support:
* Pluggable delegation token store (with implementation for ZooKeeper as alternative for all
in-memory as found in AbstractDelegationTokenSecretManager)
* Fallback for delegation token retrieval from token store when not found in memory (wrap/extend
* Cancellation of token in token store
* Purging of expired tokens from token store

Is this proposal going into the right direction, considering overall Hadoop core and Hive
security architecture?

Should the to be created JIRA ticket and future communication about this be in the context
of HCatalog project or Hive?


On Sep 1, 2011, at 1:47 PM, Devaraj Das wrote:

That's right, Thomas. The delegation tokens are issued by the metastore, and used by the commit-task
at the end of the job for committing the partitions in the metastore. A client will fail the
authentication at the server if the latter doesn't know about the token.

On Sep 1, 2011, at 1:38 PM, Thomas Weise wrote:

I assume it is the delegation token support added in 0.7 that needs to be looked at?

On Sep 1, 2011, at 12:45 PM, Thomas Weise wrote:


Can you explain a bit more where and why security tokens are kept on the Thrift server?

The communication to the metastore server through Thrift/SASL would use Kerberos, is it correct
that this part is stateless, i.e. the next call going to another instance would repeat the
Kerberos authentication and no state needs to be tracked for the API access?

Is the token tracking related to authentication of the Thrift metastore server to other services?


On Sep 1, 2011, at 10:40 AM, Alan Gates wrote:

The Thrift server that HCatalog uses to service metastore requests is the other SPOF in HCat.
 In unsecure mode it does not track state and so starting two servers and putting them behind
a VIP should be fine.  However, to my knowledge no one has tested this setup and if you are
thinking of using it you should test it before you buy hardware, make installation plans,

In secure mode some of the security tokens are kept on the Thrift servers, and thus you cannot
use a VIP server in a round robin fashion.  If you could set it up such that the same client
went to the same server for the duration of their kerberos tickets then I think it would work
(again, test this, as no one has as far as I know).  In this scenario fail over would not
be seamless for users who were talking to the failed server.  They would get authentication
errors when they failed over and would be forced to restart.


On Aug 31, 2011, at 7:11 PM, Thomas Weise wrote:


I'm looking into HA support for hcatalog. We are going to have HA support at the metastore
RDBMS level. Beyond that, which areas of the server need to be looked at to accomplish failover
running multiple hcatalog servers with a VIP?

What state outside the database is maintained by hcatalog that needs to be available to other
instances to accomplish a VIP based failover in secure deployment?


View raw message