hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@hortonworks.com>
Subject Re: [DISCUSS] HCatalog becoming a subproject of Hive
Date Fri, 21 Dec 2012 01:51:33 GMT
Namit,

I was not proposing that promotion to full committership would be automatic.  I assume it
would still be done via a vote by the PMC.  I agree that we cannot _guarantee_ committership
for HCat committers in 6-9 months.  But I am trying to lay out a clear path they can follow.
 If they don't follow the path then they won't be committers.  I am also trying to make it
non-preferential in that I am setting the criteria to be what I believe the Hive PMC would
expect any prospective Hive committer to do.  The only intended preferential part of the proposal
is the Hive shepherds, which we have all agreed is a good idea.

Alan.

On Dec 19, 2012, at 8:23 PM, Namit Jain wrote:

> I don’t agree with the proposal. It is impractical to have a Hcat committer
> with commit access to Hcat only portions of Hive. We cannot guarantee that
> a Hcat
> committer will become a Hive committer in 6-9 months, that depends on what
> they do
> in the next 6-9 months.
> 
> The current Hcat committers should spend more time in reviewing patches,
> work on non-Hcat areas in Hive, and then gradually become a hive
> committer. They should not be given any preferential treatment, and the
> process should be same as it would be for any other hive contributor
> currently. Given that the expertise of the Hcat committers, they should
> be inline for becoming a hive committer if they continue to work in hive,
> but that cannot be guaranteed. I agree that some Hive committers should try
> and help the existing Hcat patches, and again that is voluntary and
> different
> committers cannot be assigned to different parts of the code.
> 
> Thanks,
> -namit
> 
> 
> 
> 
> 
> 
> 
> On 12/20/12 1:03 AM, "Carl Steinbach" <cwsteinbach@gmail.com> wrote:
> 
>> Alan's proposal sounds like a good idea to me.
>> 
>> +1
>> 
>> On Dec 18, 2012 5:36 PM, "Travis Crawford" <traviscrawford@gmail.com>
>> wrote:
>> 
>>> Alan, I think your proposal sounds great.
>>> 
>>> --travis
>>> 
>>> On Tue, Dec 18, 2012 at 1:13 PM, Alan Gates <gates@hortonworks.com>
>>> wrote:
>>>> Carl, speaking just for myself and not as a representative of the HCat
>>> PPMC at this point, I am coming to agree with you that HCat integrating
>>> with Hive fully makes more sense.
>>>> 
>>>> However, this makes the committer question even thornier.  Travis and
>>> Namit, I think the shepherd proposal needs to lay out a clear and time
>>> bounded path to committership for HCat committers.  Having HCat
>>> committers
>>> as second class Hive citizens for the long run will not be healthy.  I
>>> propose the following as a starting point for discussion:
>>>> 
>>>> All active HCat committers (those who have contributed or committed a
>>> patch in the last 6 months) will be made committers in the HCat portion
>>> only of Hive.  In addition those committers will be assigned a
>>> particular
>>> shepherd who is a current Hive committer and who will be responsible for
>>> mentoring them towards full Hive committership.  As a part of this
>>> mentorship the HCat committer will review patches of other contributors,
>>> contribute patches to Hive (both inside and outside of HCatalog),
>>> respond
>>> to user issues on the mailing lists, etc.  It is intended that as a
>>> result
>>> of this mentorship program HCat committers can become full Hive
>>> committers
>>> in 6-9 months.  No new HCat only committers will be elected in Hive
>>> after
>>> this.  All Hive committers will automatically also have commit rights on
>>> HCatalog.
>>>> 
>>>> Alan.
>>>> 
>>>> On Dec 14, 2012, at 10:05 AM, Carl Steinbach wrote:
>>>> 
>>>>> On a functional level I don't think there is going to be much of a
>>>>> difference between the subproject option proposed by Travis and the
>>> other
>>>>> option where HCatalog becomes a TLP. In both cases HCatalog and Hive
>>> will
>>>>> have separate committers, separate code repositories, separate
>>> release
>>>>> cycles, and separate project roadmaps. Aside from ASF bureaucracy, I
>>> think
>>>>> the only major difference between the two options is that the
>>> subproject
>>>>> route will give the rest of the community the false impression that
>>> the
>>> two
>>>>> projects have coordinated roadmaps and a process to prevent
>>> overlapping
>>>>> functionality from appearing in both projects. Consequently, If these
>>> are
>>>>> the only two options then I would prefer that HCatalog become a TLP.
>>>>> 
>>>>> On the other hand, I also agree with many of the sentiments that have
>>>>> already been expressed in this thread, namely that the two projects
>>> are
>>>>> closely related and that it would benefit the community at large if
>>> the
>>> two
>>>>> projects could be brought closer together. Up to this point the major
>>>>> source of pain for the HCatalog team has been the frequent necessity
>>> of
>>>>> making changes on both the Hive and HCatalog sides when implementing
>>> new
>>>>> features in HCatalog. This situation is compounded by the ASF
>>> requirement
>>>>> that release artifacts may not depend on snapshot artifacts from
>>> other
>>> ASF
>>>>> projects. Furthermore, if Hive adds a dependency on HCatalog then it
>>> will
>>>>> be subject to these same problems (in addition to the gross circular
>>>>> dependency!).
>>>>> 
>>>>> I think the best way to avoid these problems is for HCatalog to
>>> become a
>>>>> Hive submodule. In this scenario HCatalog would exist as a
>>> subdirectory
>>> in
>>>>> the Hive repository and would be distributed as a Hive artifact in
>>> future
>>>>> Hive releases. In addition to solving the problems I mentioned
>>> earlier,
>>> I
>>>>> think this would also help to assuage the concerns of many Hive
>>> committers
>>>>> who don't want to see the MetaStore split out into a separate
>>> project.
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> Carl
>>>>> 
>>>>> On Thu, Dec 13, 2012 at 7:59 PM, Namit Jain <njain@fb.com> wrote:
>>>>> 
>>>>>> I am fine with this. Any hive committers who wants to volunteer to
>>> be
>>>>>> a hcat shepherd is welcome.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 12/14/12 7:01 AM, "Travis Crawford" <traviscrawford@gmail.com>
>>> wrote:
>>>>>> 
>>>>>>> Thanks for reviving this thread. Reviewing the comments everyone
>>> seems
>>>>>>> to agree HCatalog makes sense as a Hive subproject. I think that's
>>>>>>> great news for the Hadoop community.
>>>>>>> 
>>>>>>> The discussion seems to have turned to one of committer
>>> permissions. I
>>>>>>> agree with the Hive folks sentiment that its something that must
be
>>>>>>> earned. That said, I've found it challenging at times getting
>>> patches
>>>>>>> into Hive that would help earn taking on a hive committer
>>>>>>> responsibility.
>>>>>>> 
>>>>>>> Proposal: if a couple hive committers can volunteer to be hcat
>>>>>>> shepherds, we can work with the shepherds when making hive changes
>>> in
>>>>>>> a timely manor. Conversely, we can help shepherd any hive
>>> committers
>>>>>>> who are interested in working more with hcat. There are certainly
>>>>>>> benefits to cross-committership, and this approach could help
each
>>>>>>> other build a history of meaningful contributions and earn the
>>>>>>> privilege & responsibility of being committers.
>>>>>>> 
>>>>>>> Thoughts?
>>>>>>> 
>>>>>>> --travis
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Dec 13, 2012 at 11:59 AM, Edward Capriolo <
>>> edlinuxguru@gmail.com>
>>>>>>> wrote:
>>>>>>>> I initially was a hesitant of hcatalog mostly because I imagined
>>> we
>>>>>>>> would
>>>>>>>> end up in a spot very similar to this.
>>>>>>>> 
>>>>>>>> Namely the hcatlog folks are interested in making a metastore
to
>>> support
>>>>>>>> pig, hive, and map reduce. However I get the impression that
many
>>> in
>>>>>>>> hive
>>>>>>>> do not care much to have a metastore that caters to everyone.
>>> Their
>>>>>>>> needs
>>>>>>>> are only based on what hive needs. Which I believe is the
wrong
>>> way
>>> to
>>>>>>>> look
>>>>>>>> at this situation.
>>>>>>>> 
>>>>>>>> I though to reply to this thread because I have been following
>>> this
>>>>>>>> Jira:
>>>>>>>> https://issues.apache.org/jira/browse/HIVE-3752
>>>>>>>> 
>>>>>>>> On a high level I do not like this duplication of effort
and
>>> code. If
>>>>>>>> hive
>>>>>>>> is compatible with hcatalog I do not see why we put off merging
>>> the
>>> two
>>>>>>>> at
>>>>>>>> all. Hive users would get an immediate benefit if Hive used
>>> hcatalog
>>>>>>>> with
>>>>>>>> no apparent downside. Meanwhile we are putting this off and
>>> staying
>>> in
>>>>>>>> this
>>>>>>>> awkward transition phase.
>>>>>>>> 
>>>>>>>> Personally, I do not have a problem being a hive committer
and not
>>>>>>>> having
>>>>>>>> hcatalog commit. None of the hive work I have done has ever
>>> touched
>>> the
>>>>>>>> metastore. Also of the thousands of jiras and features we
have
>>> added
>>>>>>>> only a
>>>>>>>> small portion require metastore changes.
>>>>>>>> 
>>>>>>>> As long as a couple active users have commit on hive and
the
>>> suggested
>>>>>>>> hcatalog subproject I do not think not having commit will
be a
>>>>>>>> roadblock in
>>>>>>>> moving hive forward.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Dec 3, 2012 at 6:22 PM, Alan Gates <gates@hortonworks.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I am not sure where we are on this discussion.  So far
those who
>>> have
>>>>>>>>> chimed in seemed generally positive (Namit, Edward, Clark,
>>> Alexander).
>>>>>>>>> Namit and I have different visions for what the committership
>>> might
>>>>>>>>> look
>>>>>>>>> like, so I'd like to hear from other Hive PMC members
what their
>>> view
>>>>>>>>> is on
>>>>>>>>> this.  I have to say from an HCatalog perspective the
>>> proposition is
>>>>>>>>> much
>>>>>>>>> less attractive without some commit rights.
>>>>>>>>> 
>>>>>>>>> On a related note, people should be aware of these threads
in the
>>>>>>>>> Incubator list:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>> http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/%
>>>>>>>>> 3CCAGU5spdWHNtJxgQ8f%3DnPEXx9xNLjyjOYaFfnSw4EyAjgm1c46w%
>>>>>> 40mail.gmail.com
>>>>>>>>> %3E
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>> http://mail-archives.apache.org/mod_mbox/incubator-general/201211.mbox/%
>>>>>>>>> 3CCAKQbXgDZj_zMj4qSodXjMHV7xQZxpcY1-35cvq959YKLNd6tJQ%
>>> 40mail.gmail.com
>>>>>> %3
>>>>>>>>> E
>>>>>>>>> 
>>>>>>>>> For those not inclined to read all the mails in the threads
I
>>> will
>>>>>>>>> summarize (though I urge all PMC members of Hive and
PPMC
>>> members of
>>>>>>>>> HCat
>>>>>>>>> to read both mail threads because this is highly relevant
to
>>> what we
>>>>>>>>> are
>>>>>>>>> discussing).  There are two salient points in these threads:
>>>>>>>>> 
>>>>>>>>> 1) It is not wise to build a subproject that is distinct
from the
>>> main
>>>>>>>>> project in the sense that it has separate community members
>>> interested
>>>>>>>>> in
>>>>>>>>> it.  Bertrand, Arun, Chris Mattman, and Greg Stein all
spoke
>>> against
>>>>>>>>> this,
>>>>>>>>> and all are long time Apache contributors with a lot
of
>>> experience.
>>>>>>>>> They
>>>>>>>>> were all of the opinion that it was reasonable for one
project to
>>>>>>>>> release
>>>>>>>>> separate products.
>>>>>>>>> 
>>>>>>>>> 2) It is not wise to have committers that have access
to parts
>>> of a
>>>>>>>>> project but not others.  Greg and Bertrand argued (and
Arun
>>> seemed
>>> to
>>>>>>>>> imply) that splitting up committer lists by sections
of the code
>>> did
>>>>>>>>> not
>>>>>>>>> work out well.
>>>>>>>>> 
>>>>>>>>> These insights cause me to question what we mean by subproject.
>>> I
>>> had
>>>>>>>>> originally envisioned something that looked like Pig
and Hive did
>>> when
>>>>>>>>> they
>>>>>>>>> were subprojects of Hadoop.  But this violates both 1
and 2
>>> above.
>>>>>>>>> Given
>>>>>>>>> this input from many of the "wise old timers" of Apache
I think
>>> we
>>>>>>>>> should
>>>>>>>>> consider what we mean when we say subproject and how
tightly we
>>> are
>>>>>>>>> willing
>>>>>>>>> to integrate these projects.  Personally I think it makes
sense
>>> to
>>>>>>>>> continue
>>>>>>>>> to pursue integration, as I think HCat is really a set
of
>>> interfaces
>>>>>>>>> on top
>>>>>>>>> of Hive and it makes sense to coalesce those into one
project.  I
>>> guess
>>>>>>>>> this would mean HCat becomes just another set of jars
that Hive
>>>>>>>>> releases
>>>>>>>>> when it releases, rather than a stand alone entity. 
But I'm
>>> curious to
>>>>>>>>> hear what others think.
>>>>>>>>> 
>>>>>>>>> Alan.
>>>>>>>>> 
>>>>>>>>> On Nov 14, 2012, at 10:22 PM, Namit Jain wrote:
>>>>>>>>> 
>>>>>>>>>> The same criteria should be applied to all Hive committers.
>>> Only a
>>>>>>>>>> committer should be able to commit code.
>>>>>>>>>> I don¹t think we should bend this rule. Metastore
is not a
>>> separate
>>>>>>>>>> project, but a integral part of hive.
>>>>>>>>>> 
>>>>>>>>>> -namit
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 11/12/12 10:32 PM, "Alan Gates" <gates@hortonworks.com>
>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I would suggest looking over the patch history
of HCat
>>> committers.
>>>>>>>>> I
>>>>>>>>>>> think most of them have already contributed a
number of
>>> patches to
>>>>>>>>> the
>>>>>>>>>>> metastore.  All are certainly aware of how to
run Hive unit
>>> tests
>>>>>>>>> and
>>>>>>>>>>> have an understanding of how Hive works.  So
I don't think it's
>>>>>>>>> fair to
>>>>>>>>>>> say they would be unsafe with access to the metastore.
 And the
>>>>>>>>> Hive PMC
>>>>>>>>>>> is there to assure this does not happen.  If
there are issues
>>> I am
>>>>>>>>> sure
>>>>>>>>>>> they can deal with them.
>>>>>>>>>>> 
>>>>>>>>>>> Alan.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Nov 6, 2012, at 8:06 PM, Namit Jain wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Alan, that would not be a good idea. Metastore
code is part of
>>> hive
>>>>>>>>>>>> code,
>>>>>>>>>>>> and it
>>>>>>>>>>>> would be safer if only Hive committers had
commit access to
>>> that.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 11/6/12 11:25 PM, "Alan Gates" <gates@hortonworks.com>
>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 4, 2012, at 8:35 PM, Namit Jain
wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I like the idea of Hcatalog becoming
a Hive sub-project. The
>>>>>>>>>>>>>> enhancements/bugs in the serde/metastore
areas can
>>> indirectly
>>>>>>>>>>>>>> benefit the hive community, and it
will be easier for the
>>> fix
>>> to
>>>>>>>>> be
>>>>>>>>> in
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>> place. Having said that, I don't
see serde/metastore
>>>>>>>>>>>>>> moving out of hive into a separate
component. Things are
>>> tied
>>> too
>>>>>>>>>>>>>> closely
>>>>>>>>>>>>>> together. I am assuming that no new
committers would
>>>>>>>>>>>>>> be automatically added to Hive as
part of this, and both
>>> Hive
>>> and
>>>>>>>>>>>>>> HCatalog
>>>>>>>>>>>>>> will continue to have its own committers.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> One thing in this we'd like to discuss
is the HCatalog
>>> committers
>>>>>>>>>>>>> having
>>>>>>>>>>>>> commit access to the metastore sections
of Hive code.  That
>>>>>>>>> doesn't
>>>>>>>>>>>>> mean
>>>>>>>>>>>>> it has to move into HCatalog's code base.
 But more and more
>>> the
>>>>>>>>> fixes
>>>>>>>>>>>>> and changes we're doing in HCatalog are
really in Hive's
>>>>>>>>> metastore.
>>>>>>>>> So
>>>>>>>>>>>>> we believe it would make sense to give
HCat committers
>>> access to
>>>>>>>>> that
>>>>>>>>>>>>> component as well as HCat.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Alan.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> -namit
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 11/3/12 2:22 AM, "Alan Gates"
<gates@hortonworks.com>
>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hello Hive community.  It is
time for HCatalog to graduate
>>> from
>>>>>>>>> the
>>>>>>>>>>>>>>> Apache Incubator.  Given the
heavy dependence of HCatalog
>>> on
>>>>>>>>> Hive
>>>>>>>>> the
>>>>>>>>>>>>>>> HCatalog community agreed it
made sense to explore
>>> graduating
>>>>>>>>> from
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> Incubator to become a subproject
of Hive (see
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>> http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/20120
>>>>>>>>>>>>>>> 9.
>>>>>>>>>>>>>>> mb
>>>>>>>>>>>>>>> 
>>> ox/%3C08C40723-8D4D-48EB-942B-8EE4327DD84A%40hortonworks.com
>>> %3E
>>>>>>>>> and
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>> http://mail-archives.apache.org/mod_mbox/incubator-hcatalog-user/20121
>>>>>>>>>>>>>>> 0.
>>>>>>>>>>>>>>> mb
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>> ox/%3CCABN7xTCRM5wXGgJKEko0PmqDXhuAYpK%2BD-H57T29zcSGhkwGQw%40mail.gma
>>>>>>>>>>>>>>> il
>>>>>>>>>>>>>>> .c
>>>>>>>>>>>>>>> om%3E ).  To help both communities
understand what
>>> HCatalog is
>>>>>>>>> and
>>>>>>>>>>>>>>> hopes
>>>>>>>>>>>>>>> to become we also developed a
roadmap that summarizes
>>> HCatalog's
>>>>>>>>>>>>>>> current
>>>>>>>>>>>>>>> features, planned features, and
other possible features
>>> under
>>>>>>>>>>>>>>> discussion:
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>> https://cwiki.apache.org/confluence/display/HCATALOG/HCatalog+Roadmap
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> So we are now approaching you
to see if there is agreement
>>> in
>>>>>>>>> the
>>>>>>>>>>>>>>> Hive
>>>>>>>>>>>>>>> community that HCatalog graduating
into Hive would make
>>> sense.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Alan.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>> 
> 


Mime
View raw message