hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <>
Subject Re: Allow other implementations of IMetaStoreClient in Hive
Date Tue, 15 Dec 2015 20:49:55 GMT
I think opening a JIRA is a good next step.


> Austin Lee <>
> December 15, 2015 at 11:19
> Thank you so much Alan for your prompt responses and for the 
> information you provided.  I will have a look at the HBase work.
> I am new to the process and it's not 100% clear to me, but the wiki 
> seems to suggest I should use this forum to get to consensus on a 
> proposal before creating a JIRA ticket.  If the "why" is clear on my 
> proposal, I would like to create a JIRA ticket and take this through 
> the rest of the process via JIRA.  Does that sound good?
> Thanks,
> Austin
> Alan Gates <>
> December 15, 2015 at 11:04
> For work along the same lines you should check out the HBase metastore 
> work in Hive 2.0.  It still uses the thrift server and RawStore but 
> puts HBase behind it instead of an RDBMS.  We did this because we 
> found that most of the inefficiencies of Hive's metadata access had to 
> do with the layout of the RDBMS and the way it was accessed.  In the 
> same work I built short-circuit options in to avoid using thrift and 
> enable sharing of objects across HiveMetaStore and HiveMetaStoreClient.
> On the backwards incompatibilities, yes IMetaStoreClient evolves in 
> lock step with the thrift interface.  My point was we often add calls, 
> add new fields to structs, etc.  Your code would still compile in 
> these cases, new features just wouldn't work.  Given that a couple 
> major Hadoop support vendors now support rolling upgrades they are 
> devs interested in making sure that client version x works properly 
> with server version x+1.
> Still, we don't test for the use case you are proposing so we could 
> end up breaking your code without knowing it.
> When I said it wasn't external, I meant we did not expect end users to 
> write code against it (like say the UDF interface).  Yes it's external 
> to the metastore package as you point out.
> Alan.
> Austin Lee <>
> December 15, 2015 at 10:46
> Yes, a more efficient implementation is what I am trying to achieve.  
> I also want to retain the ability to talk to a remote metastore that 
> is not necessarily thrift.
> To be more precise, what I would like is a more efficient metastore.  
> In looking at the current architecture, I came to a conclusion that 
> there are three logical boundaries where I can inject an improved 
> implementation or alternative to what Hive offers in the metastore space.
> 1) RawStore
> I think the existing mechanism that Hive offers users to choose from 
> major RDBMSes works fine.  I suppose there's still room for 
> improvement here, but the impact of those improvements would be 
> limited to the storage aspects of metadata.
> 2) Thrift server
> An alternative HiveMetaStore that talks Hive Metastore Thrift.  It's 
> almost a coin toss between this and #3, but I think for the reasons I 
> will state below, #3 is preferable.
> 3) IMetaStoreClient
> I feel this gives me the most freedom since I can be embedded or 
> remote.  I am not tied to the Thrift interface or the RawStore 
> interface, if I choose to roll my own.
> One thing that does concern me is your statement about 
> IMetaStoreClient being an internal interface, which is true.  Do the 
> changes to this interface really happen ad-hoc?  Doesn't it evolve in 
> lock step with the Thrift interface?  If so, wouldn't backward 
> compatibility guarantees for Thrift translate to backward 
> compatibility guarantees for this interface as well?  From the way it 
> is used by Query Planning, I think it could be made an "external" 
> interface that belongs in hive-metastore.
> Alan Gates <>
> December 15, 2015 at 10:14
> I don't see an issue with this, it seems fine.  One caveat though is 
> we see this as an internal interface and we change it all the time.  I 
> wouldn't want to be pushed into making backwards compatibility 
> guarantees for IMetaStoreClient.  Which means that if you develop a 
> different implementation of it outside Hive it will likely break on 
> every upgrade.
> I don't understand your example use case.  You can run Hive now 
> without the thrift server, so I'm guessing that's not what you're 
> really trying to do.  Are you just interested in building a more 
> efficient implementation or do you have another use case in mind?
> Alan.
> Austin Lee <>
> December 14, 2015 at 20:48
> Hi,
> I would like to propose a change that would make it possible for users to
> choose an implementation of IMetaStoreClient via HiveConf, i.e.
> hive-site.xml. Currently, in Hive the choice is hard coded to be
> SessionHiveMetaStoreClient in org.apache.hadoop.hive.ql.metadata.Hive.
> There is no other direct reference to SessionHiveMetaStoreClient other 
> than
> the hard coded class name in and the QL component operates only
> on the IMetaStoreClient interface so the change would be minimal and it
> would be quite similar to how an implementation of RawStore is specified
> and loaded in hive-metastore. One use case this change would serve would
> be one where a user wishes to use an implementation of this interface
> without the dependency on the Thrift server. I would appreciate the
> community's input and feedback on this proposal.
> Thank you,
> Austin

View raw message