hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <>
Subject Re: Allow other implementations of IMetaStoreClient in Hive
Date Tue, 15 Dec 2015 19:04:06 GMT
For work along the same lines you should check out the HBase metastore 
work in Hive 2.0.  It still uses the thrift server and RawStore but puts 
HBase behind it instead of an RDBMS.  We did this because we found that 
most of the inefficiencies of Hive's metadata access had to do with the 
layout of the RDBMS and the way it was accessed.  In the same work I 
built short-circuit options in to avoid using thrift and enable sharing 
of objects across HiveMetaStore and HiveMetaStoreClient.

On the backwards incompatibilities, yes IMetaStoreClient evolves in lock 
step with the thrift interface.  My point was we often add calls, add 
new fields to structs, etc.  Your code would still compile in these 
cases, new features just wouldn't work.  Given that a couple major 
Hadoop support vendors now support rolling upgrades they are devs 
interested in making sure that client version x works properly with 
server version x+1.

Still, we don't test for the use case you are proposing so we could end 
up breaking your code without knowing it.

When I said it wasn't external, I meant we did not expect end users to 
write code against it (like say the UDF interface).  Yes it's external 
to the metastore package as you point out.


> Austin Lee <>
> December 15, 2015 at 10:46
> Yes, a more efficient implementation is what I am trying to achieve.  
> I also want to retain the ability to talk to a remote metastore that 
> is not necessarily thrift.
> To be more precise, what I would like is a more efficient metastore.  
> In looking at the current architecture, I came to a conclusion that 
> there are three logical boundaries where I can inject an improved 
> implementation or alternative to what Hive offers in the metastore space.
> 1) RawStore
> I think the existing mechanism that Hive offers users to choose from 
> major RDBMSes works fine.  I suppose there's still room for 
> improvement here, but the impact of those improvements would be 
> limited to the storage aspects of metadata.
> 2) Thrift server
> An alternative HiveMetaStore that talks Hive Metastore Thrift.  It's 
> almost a coin toss between this and #3, but I think for the reasons I 
> will state below, #3 is preferable.
> 3) IMetaStoreClient
> I feel this gives me the most freedom since I can be embedded or 
> remote.  I am not tied to the Thrift interface or the RawStore 
> interface, if I choose to roll my own.
> One thing that does concern me is your statement about 
> IMetaStoreClient being an internal interface, which is true.  Do the 
> changes to this interface really happen ad-hoc?  Doesn't it evolve in 
> lock step with the Thrift interface?  If so, wouldn't backward 
> compatibility guarantees for Thrift translate to backward 
> compatibility guarantees for this interface as well?  From the way it 
> is used by Query Planning, I think it could be made an "external" 
> interface that belongs in hive-metastore.
> Alan Gates <>
> December 15, 2015 at 10:14
> I don't see an issue with this, it seems fine.  One caveat though is 
> we see this as an internal interface and we change it all the time.  I 
> wouldn't want to be pushed into making backwards compatibility 
> guarantees for IMetaStoreClient.  Which means that if you develop a 
> different implementation of it outside Hive it will likely break on 
> every upgrade.
> I don't understand your example use case.  You can run Hive now 
> without the thrift server, so I'm guessing that's not what you're 
> really trying to do.  Are you just interested in building a more 
> efficient implementation or do you have another use case in mind?
> Alan.
> Austin Lee <>
> December 14, 2015 at 20:48
> Hi,
> I would like to propose a change that would make it possible for users to
> choose an implementation of IMetaStoreClient via HiveConf, i.e.
> hive-site.xml. Currently, in Hive the choice is hard coded to be
> SessionHiveMetaStoreClient in org.apache.hadoop.hive.ql.metadata.Hive.
> There is no other direct reference to SessionHiveMetaStoreClient other 
> than
> the hard coded class name in and the QL component operates only
> on the IMetaStoreClient interface so the change would be minimal and it
> would be quite similar to how an implementation of RawStore is specified
> and loaded in hive-metastore. One use case this change would serve would
> be one where a user wishes to use an implementation of this interface
> without the dependency on the Thrift server. I would appreciate the
> community's input and feedback on this proposal.
> Thank you,
> Austin

View raw message