hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Collins <...@cloudera.com>
Subject [DISCUSSION] Proposal for making core Hadoop changes
Date Fri, 21 May 2010 20:42:28 GMT
As HDFS and MapReduce have matured the cost and complexity of
introducing features has grown. Each new feature has to consider
interactions with a growing set of existing features, a growing user
base (upgrades, backwards compatibility) and additional use cases
(more and more projects now build on them). At the same time we don't
want the high bar for contribution to unnecessarily hinder new
development and releases.

Many projects at a similar stage address this by adopting a more
formal way to describe, socialize and shepherd enhancements to their
platforms. Today, new features are often discussed via an umbrella
jira, which may have an attached design document. There are a number
of issues with this approach. The design documents vary in format and
quality, and are often reviewed by a limited audience. They aren't
version controlled. Sometimes the proposal is only partially
specified. Jiras are often ignored. Understanding a proposal and it's
implications through a series of threads in the jira comments is
difficult. It's hard for contributors and users to find these
top-level jiras and follow their status.

I'd like to propose that core Hadoop adopts something similar to
Python's PEP (Python Enhancement Proposal) [1]. A "HEP" would be a
single primary mechanism for proposing new features, incorporating
community feedback, and recording decisions. The author of the HEP
would be responsible for building consensus and moving the feature
forward. Similarly, some subset of the community would be responsible
for reviewing HEPs in a timely manner and identifying missing pieces
in the proposal. Discussion would occur before patches showed up on
jira. People interested in the core Hadoop roadmap could keep an eye
on the HEPs without the overhead of following jira traffic.

Why base this on the PEP? The format has proven useful to a
substantial existing project, and I think the workflow is not too
heavy-weight, and well-suited to a community such as ours. That being
said, we could discuss other models (eg Java's JSR).

Before we get into specifics, is this something the community would
like to adopt in some form? Does adapting the PEP and its workflow to
our projects, community and bylaws seem reasonable?


1. http://www.python.org/dev/peps/pep-0001

View raw message