hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <alanfga...@gmail.com>
Subject Re: [DISCUSS] Separating out the metastore as its own TLP
Date Wed, 05 Jul 2017 17:51:33 GMT
On Mon, Jul 3, 2017 at 6:20 AM, Edward Capriolo <edlinuxguru@gmail.com>
wrote:

>
> We already have things in the meta-store not directly tied to language
> features. For example hive metastore has a "retention" property which is
> not actively in use by anything. In reality, we rarely say 'no' or -1 to
> much. Which in part is why I believe our release process is grinding
> slower: we have so many things in flight I do not feel that any one person
> can keep track. You are working on porting the metastore to hbase.
> https://issues.apache.org/jira/browse/HIVE-9452 did you get a -1 or 'No'
> along the way? When I first noticed this I pointed out that someone has
> already ported the metastore to Cassandra
> https://github.com/riptano/brisk/blob/master/src/java/
> src/org/apache/cassandra/hadoop/hive/metastore/SchemaManagerService.java,
> but I was more exciting/rational for this multi-year approach using hbase
> so I let everyone 'have at it'.
>
Your example and mine are not equivalent.  The HBase metastore is still a
Hive feature, even if some thought it not worth while.  That is different
than people bringing features that will never interest Hive or that Hive
could never use (e.g. Dain’s desire for the metastore to support Presto
style views).

I forgot to mention the issue these would be non-Hive contributors have
with releases if they contribute their features to the metastore while it’s
inside Hive.  Is Hive going to do a release just to push out features in
the metastore that it doesn’t care about?

You seem to be asserting that doing this doesn’t really help non-Hive based
systems that are using or would like to use the metastore.  But it is
interesting that people from three of those systems have commented in the
thread so far, and all are positive (Dmitrias from Impala, Dain from
Presto, and Sriharsha from the schema registry project).


> I am going to give a hypothetical but real world situation. Suppose I want
> to add the statement "CREATE permanent macro xyz", this feature I believe
> would cross cut calcite, hive, and hive metastore. To build this feature I
> would need to orchestrate the change across 3 separate groups of hive
> 'subcommittees' for lack of a better word. 3 git repos, 3 Jira's 3
> releases. That is not counting if we run into some bug or misfeature (maybe
> with Tez or something else) so that brings in 4-5 releases of upstream to
> add a feature to hive. This does not take into account normal processes
> mess ups. For example say you get the metastore done, but now the people
> doing the calcite/antlr suggest the feature have different syntax because
> they did not read the 3-4 linked tickets when the process started? Now, you
> have to loop back around the process. Finding 1 person in 1 project to
> usher along the feature you want is difficult, having to find and clear
> time with 3 people across three projects is going to be a difficult along
> with then 'pushing' them all to kick out a release so you can finally use
> said feature.
>

I partially agree with you.  On the reviews, JIRAs, etc. I don’t think it
adds much, if any, overhead.  Hive is a big project and no one person knows
all the code anymore.  If you wanted to add a permanent macros feature you
would need reviews from someone who knows the parser (probably Pengcheng),
people who know the optimizer (Jesus, Ashutosh, …), and someone who knows
the metastore (me, Thejas, …).  And any large feature is going to be
implemented over multiple JIRAs, all of which are linkable regardless of
whether the JIRAs start with METASTORE- or HIVE-.   I also don’t think it
makes the feature disagreement any worse.  If the optimizer team absolutely
insists it has to have some feature and the metastore team insists that it
can’t have that feature you’re going to have to work through the issue
whether they all are in Hive or in two separate projects.

Where I agree the split adds cost is releases.  Before your macro feature
could go live you need releases from each of the components.  And while in
development the components need to use snapshot versions of the other
components.  My assertion is that the benefits out weigh this cost.

Alan.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message