hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Travis Crawford <>
Subject Re: [DISCUSS] HCatalog becoming a subproject of Hive
Date Wed, 19 Dec 2012 01:36:02 GMT
Alan, I think your proposal sounds great.


On Tue, Dec 18, 2012 at 1:13 PM, Alan Gates <> wrote:
> Carl, speaking just for myself and not as a representative of the HCat PPMC at this point,
I am coming to agree with you that HCat integrating with Hive fully makes more sense.
> However, this makes the committer question even thornier.  Travis and Namit, I think
the shepherd proposal needs to lay out a clear and time bounded path to committership for
HCat committers.  Having HCat committers as second class Hive citizens for the long run will
not be healthy.  I propose the following as a starting point for discussion:
> All active HCat committers (those who have contributed or committed a patch in the last
6 months) will be made committers in the HCat portion only of Hive.  In addition those committers
will be assigned a particular shepherd who is a current Hive committer and who will be responsible
for mentoring them towards full Hive committership.  As a part of this mentorship the HCat
committer will review patches of other contributors, contribute patches to Hive (both inside
and outside of HCatalog), respond to user issues on the mailing lists, etc.  It is intended
that as a result of this mentorship program HCat committers can become full Hive committers
in 6-9 months.  No new HCat only committers will be elected in Hive after this.  All Hive
committers will automatically also have commit rights on HCatalog.
> Alan.
> On Dec 14, 2012, at 10:05 AM, Carl Steinbach wrote:
>> On a functional level I don't think there is going to be much of a
>> difference between the subproject option proposed by Travis and the other
>> option where HCatalog becomes a TLP. In both cases HCatalog and Hive will
>> have separate committers, separate code repositories, separate release
>> cycles, and separate project roadmaps. Aside from ASF bureaucracy, I think
>> the only major difference between the two options is that the subproject
>> route will give the rest of the community the false impression that the two
>> projects have coordinated roadmaps and a process to prevent overlapping
>> functionality from appearing in both projects. Consequently, If these are
>> the only two options then I would prefer that HCatalog become a TLP.
>> On the other hand, I also agree with many of the sentiments that have
>> already been expressed in this thread, namely that the two projects are
>> closely related and that it would benefit the community at large if the two
>> projects could be brought closer together. Up to this point the major
>> source of pain for the HCatalog team has been the frequent necessity of
>> making changes on both the Hive and HCatalog sides when implementing new
>> features in HCatalog. This situation is compounded by the ASF requirement
>> that release artifacts may not depend on snapshot artifacts from other ASF
>> projects. Furthermore, if Hive adds a dependency on HCatalog then it will
>> be subject to these same problems (in addition to the gross circular
>> dependency!).
>> I think the best way to avoid these problems is for HCatalog to become a
>> Hive submodule. In this scenario HCatalog would exist as a subdirectory in
>> the Hive repository and would be distributed as a Hive artifact in future
>> Hive releases. In addition to solving the problems I mentioned earlier, I
>> think this would also help to assuage the concerns of many Hive committers
>> who don't want to see the MetaStore split out into a separate project.
>> Thanks.
>> Carl
>> On Thu, Dec 13, 2012 at 7:59 PM, Namit Jain <> wrote:
>>> I am fine with this. Any hive committers who wants to volunteer to be
>>> a hcat shepherd is welcome.
>>> On 12/14/12 7:01 AM, "Travis Crawford" <> wrote:
>>>> Thanks for reviving this thread. Reviewing the comments everyone seems
>>>> to agree HCatalog makes sense as a Hive subproject. I think that's
>>>> great news for the Hadoop community.
>>>> The discussion seems to have turned to one of committer permissions. I
>>>> agree with the Hive folks sentiment that its something that must be
>>>> earned. That said, I've found it challenging at times getting patches
>>>> into Hive that would help earn taking on a hive committer
>>>> responsibility.
>>>> Proposal: if a couple hive committers can volunteer to be hcat
>>>> shepherds, we can work with the shepherds when making hive changes in
>>>> a timely manor. Conversely, we can help shepherd any hive committers
>>>> who are interested in working more with hcat. There are certainly
>>>> benefits to cross-committership, and this approach could help each
>>>> other build a history of meaningful contributions and earn the
>>>> privilege & responsibility of being committers.
>>>> Thoughts?
>>>> --travis
>>>> On Thu, Dec 13, 2012 at 11:59 AM, Edward Capriolo <>
>>>> wrote:
>>>>> I initially was a hesitant of hcatalog mostly because I imagined we
>>>>> would
>>>>> end up in a spot very similar to this.
>>>>> Namely the hcatlog folks are interested in making a metastore to support
>>>>> pig, hive, and map reduce. However I get the impression that many in
>>>>> hive
>>>>> do not care much to have a metastore that caters to everyone. Their
>>>>> needs
>>>>> are only based on what hive needs. Which I believe is the wrong way to
>>>>> look
>>>>> at this situation.
>>>>> I though to reply to this thread because I have been following this
>>>>> Jira:
>>>>> On a high level I do not like this duplication of effort and code. If
>>>>> hive
>>>>> is compatible with hcatalog I do not see why we put off merging the two
>>>>> at
>>>>> all. Hive users would get an immediate benefit if Hive used hcatalog
>>>>> with
>>>>> no apparent downside. Meanwhile we are putting this off and staying in
>>>>> this
>>>>> awkward transition phase.
>>>>> Personally, I do not have a problem being a hive committer and not
>>>>> having
>>>>> hcatalog commit. None of the hive work I have done has ever touched the
>>>>> metastore. Also of the thousands of jiras and features we have added
>>>>> only a
>>>>> small portion require metastore changes.
>>>>> As long as a couple active users have commit on hive and the suggested
>>>>> hcatalog subproject I do not think not having commit will be a
>>>>> roadblock in
>>>>> moving hive forward.
>>>>> On Mon, Dec 3, 2012 at 6:22 PM, Alan Gates <>
>>>>> wrote:
>>>>>> I am not sure where we are on this discussion.  So far those who
>>>>>> chimed in seemed generally positive (Namit, Edward, Clark, Alexander).
>>>>>> Namit and I have different visions for what the committership might
>>>>>> look
>>>>>> like, so I'd like to hear from other Hive PMC members what their
>>>>>> is on
>>>>>> this.  I have to say from an HCatalog perspective the proposition
>>>>>> much
>>>>>> less attractive without some commit rights.
>>>>>> On a related note, people should be aware of these threads in the
>>>>>> Incubator list:
>>>>>> 3CCAGU5spdWHNtJxgQ8f%3DnPEXx9xNLjyjOYaFfnSw4EyAjgm1c46w%
>>>>>> %3E
>>> %3
>>>>>> E
>>>>>> For those not inclined to read all the mails in the threads I will
>>>>>> summarize (though I urge all PMC members of Hive and PPMC members
>>>>>> HCat
>>>>>> to read both mail threads because this is highly relevant to what
>>>>>> are
>>>>>> discussing).  There are two salient points in these threads:
>>>>>> 1) It is not wise to build a subproject that is distinct from the
>>>>>> project in the sense that it has separate community members interested
>>>>>> in
>>>>>> it.  Bertrand, Arun, Chris Mattman, and Greg Stein all spoke against
>>>>>> this,
>>>>>> and all are long time Apache contributors with a lot of experience.
>>>>>> They
>>>>>> were all of the opinion that it was reasonable for one project to
>>>>>> release
>>>>>> separate products.
>>>>>> 2) It is not wise to have committers that have access to parts of
>>>>>> project but not others.  Greg and Bertrand argued (and Arun seemed
>>>>>> imply) that splitting up committer lists by sections of the code
>>>>>> not
>>>>>> work out well.
>>>>>> These insights cause me to question what we mean by subproject. 
I had
>>>>>> originally envisioned something that looked like Pig and Hive did
>>>>>> they
>>>>>> were subprojects of Hadoop.  But this violates both 1 and 2 above.
>>>>>> Given
>>>>>> this input from many of the "wise old timers" of Apache I think we
>>>>>> should
>>>>>> consider what we mean when we say subproject and how tightly we are
>>>>>> willing
>>>>>> to integrate these projects.  Personally I think it makes sense to
>>>>>> continue
>>>>>> to pursue integration, as I think HCat is really a set of interfaces
>>>>>> on top
>>>>>> of Hive and it makes sense to coalesce those into one project.  I
>>>>>> this would mean HCat becomes just another set of jars that Hive
>>>>>> releases
>>>>>> when it releases, rather than a stand alone entity.  But I'm curious
>>>>>> hear what others think.
>>>>>> Alan.
>>>>>> On Nov 14, 2012, at 10:22 PM, Namit Jain wrote:
>>>>>>> The same criteria should be applied to all Hive committers. Only
>>>>>>> committer should be able to commit code.
>>>>>>> I donĀ¹t think we should bend this rule. Metastore is not a separate
>>>>>>> project, but a integral part of hive.
>>>>>>> -namit
>>>>>>> On 11/12/12 10:32 PM, "Alan Gates" <>
>>>>>>>> I would suggest looking over the patch history of HCat committers.
>>>>>> I
>>>>>>>> think most of them have already contributed a number of patches
>>>>>> the
>>>>>>>> metastore.  All are certainly aware of how to run Hive unit
>>>>>> and
>>>>>>>> have an understanding of how Hive works.  So I don't think
>>>>>> fair to
>>>>>>>> say they would be unsafe with access to the metastore.  And
>>>>>> Hive PMC
>>>>>>>> is there to assure this does not happen.  If there are issues
I am
>>>>>> sure
>>>>>>>> they can deal with them.
>>>>>>>> Alan.
>>>>>>>> On Nov 6, 2012, at 8:06 PM, Namit Jain wrote:
>>>>>>>>> Alan, that would not be a good idea. Metastore code is
part of hive
>>>>>>>>> code,
>>>>>>>>> and it
>>>>>>>>> would be safer if only Hive committers had commit access
to that.
>>>>>>>>> On 11/6/12 11:25 PM, "Alan Gates" <>
>>>>>>>>>> On Nov 4, 2012, at 8:35 PM, Namit Jain wrote:
>>>>>>>>>>> I like the idea of Hcatalog becoming a Hive sub-project.
>>>>>>>>>>> enhancements/bugs in the serde/metastore areas
can indirectly
>>>>>>>>>>> benefit the hive community, and it will be easier
for the fix to
>>>>>> be
>>>>>> in
>>>>>>>>>>> one
>>>>>>>>>>> place. Having said that, I don't see serde/metastore
>>>>>>>>>>> moving out of hive into a separate component.
Things are tied too
>>>>>>>>>>> closely
>>>>>>>>>>> together. I am assuming that no new committers
>>>>>>>>>>> be automatically added to Hive as part of this,
and both Hive and
>>>>>>>>>>> HCatalog
>>>>>>>>>>> will continue to have its own committers.
>>>>>>>>>> One thing in this we'd like to discuss is the HCatalog
>>>>>>>>>> having
>>>>>>>>>> commit access to the metastore sections of Hive code.
>>>>>> doesn't
>>>>>>>>>> mean
>>>>>>>>>> it has to move into HCatalog's code base.  But more
and more the
>>>>>> fixes
>>>>>>>>>> and changes we're doing in HCatalog are really in
>>>>>> metastore.
>>>>>> So
>>>>>>>>>> we believe it would make sense to give HCat committers
access to
>>>>>> that
>>>>>>>>>> component as well as HCat.
>>>>>>>>>> Alan.
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -namit
>>>>>>>>>>> On 11/3/12 2:22 AM, "Alan Gates" <>
>>>>>>>>>>>> Hello Hive community.  It is time for HCatalog
to graduate from
>>>>>> the
>>>>>>>>>>>> Apache Incubator.  Given the heavy dependence
of HCatalog on
>>>>>> Hive
>>>>>> the
>>>>>>>>>>>> HCatalog community agreed it made sense to
explore graduating
>>>>>> from
>>>>>>>>>>>> the
>>>>>>>>>>>> Incubator to become a subproject of Hive
>>>>>>>>>>>> 9.
>>>>>>>>>>>> mb
>>>>>>>>>>>> ox/
>>>>>> and
>>>>>>>>>>>> 0.
>>>>>>>>>>>> mb
>>>>>> ox/%3CCABN7xTCRM5wXGgJKEko0PmqDXhuAYpK%2BD-H57T29zcSGhkwGQw%40mail.gma
>>>>>>>>>>>> il
>>>>>>>>>>>> .c
>>>>>>>>>>>> om%3E ).  To help both communities understand
what HCatalog is
>>>>>> and
>>>>>>>>>>>> hopes
>>>>>>>>>>>> to become we also developed a roadmap that
summarizes HCatalog's
>>>>>>>>>>>> current
>>>>>>>>>>>> features, planned features, and other possible
features under
>>>>>>>>>>>> discussion:
>>>>>>>>>>>> So we are now approaching you to see if there
is agreement in
>>>>>> the
>>>>>>>>>>>> Hive
>>>>>>>>>>>> community that HCatalog graduating into Hive
would make sense.
>>>>>>>>>>>> Alan.

View raw message