hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Shao <zsh...@gmail.com>
Subject Re: [DISCUSSION] To be (or not to be) a TLP - that is the question
Date Tue, 20 Apr 2010 21:03:33 GMT
As a Hive committer, I don't feel the benefit we get from becoming a
TLP is big enough (compared with the cost) to make Hive a TLP.
>From Chris's comment I see that the cost is not that big, but I still
wonder what benefit we will get from that.

Also I didn't get the idea of the joke ("In fact, one could argue that
Pig opting not to be TLP yet is why Hive should go TLP"). I don't see
any reasons that applies to Pig but not Hive.
We should continue the discussion here, but anything in the Pig's
discussion should also be considered here.

Zheng

On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah <aaa@cloudera.com> wrote:
> I am personally +1 on Hive being a TLP, I think it did reach the community
> adoption and maturity level required for that. In fact, one could argue that
> Pig opting not to be TLP yet is why Hive should go TLP :) (jk).
>
> The real question to ask is whether there is a volunteer to take care of the
> "administrative" tasks, which isn't a ton of work afaiu (I am willing to
> volunteer if no body else up to the task, but I am not a committer and only
> contributed a minor patch for bash/cygwin).
>
> BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP
> tradeoffs. I happen to agree with all he says, and frankly I couldn't have
> wrote it better my self. I highlight certain parts from his message, but I
> recommend you read the whole thing.
>
> ---------- Forwarded message ----------
> From: Chris Douglas <cdouglas@apache.org>
> Date: Tue, Apr 13, 2010 at 11:46 PM
> Subject: Subprojects and TLP status
> To: general@hadoop.apache.org, private@hadoop.apache.org
>
> Most of Hadoop's subprojects have discussed becoming top-level Apache
> projects (TLPs) in the last few weeks. Most have expressed a desire to
> remain in Hadoop. The salient parts of the discussions I've read tend
> to focus on three aspects: a technical dependence on Hadoop,
> additional overhead as a TLP, and visibility both within the Hadoop
> ecosystem and in the open source community generally.
>
> Life as a TLP: this is not much harder than being a Hadoop subproject,
> and the Apache preferences being tossed around- particularly
> "insufficiently diverse"- are not blockers. Every subproject needs to
> write a section of the report Hadoop sends to the board; almost the
> same report, sent to a new address. The initial cost is similarly
> light: copy bylaws, send a few notes to INFRA, and follow some
> directions. I think the estimated costs are far higher than they will
> be in practice. Inertia is a powerful force, but it should be
> overcome. The directions are here, and should not intimidating:
>
> http://apache.org/dev/project-creation.html
>
> Visibility: the Hadoop site does not need to change. For each
> subproject, we can literally change the hyperlinks to point to the new
> page and be done. Long-term, linking to all ASF projects that run on
> Hadoop from a prominent page is something we all want. So particularly
> in the medium-term that most are considering: visibility through the
> website will not change. Each subproject will still be linked from the
> front page.
>
> Hadoop would not be nearly as popular as it is without Zookeeper,
> HBase, Hive, and Pig. All statistics on work in shared MapReduce
> clusters show that users vastly prefer running Pig and Hive queries to
> writing MapReduce jobs. HBase continues to push features in HDFS that
> increase its adoption and relevance outside MapReduce, while sharing
> some of its NoSQL limelight. Zookeeper is not only a linchpin in real
> workloads, but many proposals for future features require it. The
> bottom line is that MapReduce and HDFS need these projects for
> visibility and adoption in precisely the same way. I don't think
> separate TLPs will uncouple the broader community from one another.
>
> Technical dependence: this has two dimensions. First, influencing
> MapReduce and HDFS. This is nonsense. Earning influence by
> contributing to a subproject is the only way to push code changes;
> nobody from any of these projects has violated that by unilaterally
> committing to HDFS or MapReduce, anyway. And anyone cynical enough to
> believe that MapReduce and HDFS would deliberately screw over or
> ignore dependent projects because they don't have PMC members is
> plainly unsuited to community-driven development. I understand that
> these projects need to protect their users, but lobbying rights are
> not an actual benefit.
>
> Second, being a coherent part of the Hadoop ecosystem. It is (mostly)
> true that Hadoop currently offers a set of mutually compatible
> frameworks. It is not true that moving them to separate Apache
> projects would make solutions less coherent or affect existing or
> future users at all. The cohesion between projects' governance is
> sufficiently weak to justify independent units, but the real
> dependencies between the projects are strong enough to keep us engaged
> with one another. And it's not as if other projects- Cascading, for
> example- aren't also organisms adapted and specialized for life in
> Hadoop.
>
> Arguments on technical dependence are ignoring the nature of the
> existing interactions. Besides, weak technical dependencies are not a
> necessary prerequisite for a subproject's independence.
>
> As for what was *not* said in these discussions, there is no argument
> that every one of these subprojects has a distinct, autonomous
> community. There was also no argument that the Hadoop PMC offers any
> valuable oversight, given that the representatives of its fiefdoms are
> too consumed by provincial matters to participate in neighboring
> governance. Most releases I've voted on: I run the unit tests, check
> the signature, verify the checksum, and know literally nothing else
> about its content. I have often never heard the names of many proposed
> committers and even some proposed PMC members. Right now, subprojects
> with enough PMC members essentially vote out their own releases and
> vote in their own committers: TLPs in all but name.
>
> The Hadoop club- in conferences, meetups, technical debates, etc.- is
> broad, diverse, and intertwined, but communities of developers have
> already clustered around subprojects. Allowing that each cluster
> should govern itself is a dry, practical matter, not an existential
> crisis. -C
>
> On 4/19/2010 11:57 AM, Ashish Thusoo wrote:
>>
>> Hi Folks,
>>
>> Recently Apache Board asked the Hadoop PMC if some sub projects can become
>> top level projects. In the opinion of the board, big umbrella projects make
>> it difficult to monitor the health of the communities within the sub
>> projects. If Hive does become a TLP, then we would have to elect our own PMC
>> and take on all the administrative tasks that the Hadoop PMC does for us. So
>> there is definitely more administrative work involved as a TLP. So the
>> question is whether we should take on this additional task keeping at this
>> time and what tangible advantages and disadvantages would such a move entail
>> for the project. Would like to hear what the community thinks on this issue.
>>
>> Thanks,
>> Ashish
>>
>> PS: As some reference to what is happening in the other subprojects, at
>> this time PIG and Zookeeper have decided NOT to become TLPs where as Hbase
>> and Avro have decided to become TLPs.
>>
>>
>



-- 
Yours,
Zheng
http://www.linkedin.com/in/zshao

Mime
View raw message