hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Collins <...@cloudera.com>
Subject HEP proposal
Date Mon, 12 Jul 2010 19:39:45 GMT
A while back we started discussing on list (http://bit.ly/aFj9Ya) and
at the contributors meeting (http://bit.ly/aj4Y7I) a more coordinated
way to describe, socialize and shepherd enhancements to Hadoop.
Thanks for all the feedback.  Most of it was encouraging so I wrote up
a draft proposal with specifics to discuss here.  After incorporating
feedback I'll send out another revision for vote.


HEP: 1
Title: HEP Purpose and Guidelines
Author: Eli Collins
Status: Draft

What is a HEP?

HEP stands for Hadoop Enhancement Proposal, and is based on Python's
PEP (Python Enhancement Proposal) [1].  A HEP is a document that
describes a new feature, it's rationale, and issues the feature needs
to address in order to be successuflly incorporated.

The intent is for HEPs to be the primary mechanism for proposing
significant new features to core Hadoop (common, HDFS and MapReduce),
incorporating community feedback, and recording the proposal.  Going
through the HEP process should improve the chances that a proposal is

While HEPs do not need to come with code, they are a mechanism to
propose features to the community, with the intent of contributing the
feature, rather than request the community implement a feature.

HEPs must be consistent with Apache bylaws [2], for example, the HEP
workflow takes place on the public Apache Hadoop lists.

When is a HEP Required?

HEPs should not impede casual contribution to Hadoop.  Small
improvements and bugs do not require HEPs.  Not all features need
HEPs.  While the decision is subjective, here are some guidelines to
indicate a HEP should be considered:

- The feature impacts backwards compatibility (eg modifies released
public APIs in an incompatible way).

- The feature requires that an existing component be substantially
re-designed (eg NameNode modified to use Bookkeeper).

- The implementation impact multiple parts of the system (eg symbolic
links versus adding a pluggable component like a codec).

- The feature impacts the entire development community (eg converts
the build system to use maven).

HEP Workflow

The author of a HEP should first try to determine if their idea is
HEP-able by sending mail to the general, or the project-specific lists
if the scope of the idea is limited to the project.  This gives the
author a chance to flesh out the proposal, address intial concerns,
and figure out whether it has a chance of being accepted.  The
author's role is to build consensus, and gather dissenting opinions.

Following this discussion the author should draft a HEP proposal
following the HEP template. The proposal should accurately reflect and
address feedback and dissenting opinions.  For example, flesh out
sections on backwards compatibility or testing. The author should send
the draft of the proposal to hep@hadoop.apache.org for review.  This
is a new, public list for editors and those interested in following
the review process.

A set of editors reviews incoming HEPs. Each HEP is assigned a single
primary editor. An editor may volunteer if they feel particular
functional expertise is required or assign HEPs to editors round

The editor reviews the proposal and may request it be updated if it
does not sufficiently address feedback raised during discussion, eg
why the proposal is not redundant with existing functionality, or is
technically sound, sufficiently motivated, covers backwards
compatibility, etc. As updates are necessary, the HEP author can check
in new versions if they have commit permissions, or can email new HEP
versions to the editor for committing. In order to ensure HEP
proposals make progress the editor should respond to proposal drafts
within two weeks of receiving them (or the proposer can request
another editor), and the proposer should generate updates to the draft
within two weeks of receiving feedback from the editor.

The editor's role is to determine if the proposal is complete, so that
the proposal can be voted on, not whether they agree with the proposal
itself.  The editor's involvement should increase the chance that a
HEP proposal makes it to a vote.

Once the editor deems the proposal is complete they add it to a
versioned HEP repository and the author posts the proposal to
general@hadoop.apache.org for vote.  HEP votes, like Apache procedural
votes, use majority rule [3]. Successful HEPs are assigned a number,
unsuccessful HEPs remain drafts.

The editors are apointed and removed by the PMC informally, similar to
how the Apache Board appoints shepherds to projects.

HEP Contents

Each HEP should contain the following:

1. Preamble -- Including the HEP number, a short descriptive title,
and the names of the authors.

2. Abstract -- A short (~200 word) description of the technical issue
being addressed.

3. Copyright/public domain -- Each HEP must either be explicitly
labelled as placed in the public domain (see this HEP as an example).

4. Design -- A high-level explanation of the design. It should cover
intended use cases, failure scenarios, and impact on the existing

5. Motivation -- The motivation spells out the use case for the
feature and the benefits it provides.

6. Rationale -- The rationale describes what motivated the design and
why particular design decisions were made.  It should describe
alternate designs that were considered and related work, e.g. how the
feature is designed in other systems. It should also consider whether
the feature could be achieved by layering atop the existing system
rather than modifying it.

The rationale should provide evidence of consensus within the
community and discuss important objections or concerns raised during

7. Backwards Compatibility -- All HEPs that introduce backwards
incompatibilities must include a section describing these
incompatibilities and their severity.  The HEP must explain how the
author proposes to deal with these incompatibilities.  HEP submissions
without a sufficient backwards compatibility treatise may be rejected

HEP Template

HEPs should be plain text with minimal structural markup that adheres
to a rigid style.  You can use this HEP as an example. Each HEP starts
with a header that contains the HEP number (or empty if the number has
not yet been assigned), title, list of authors and status (Draft,
Accepted, Rejected, or Withdrawn).

Auxiliary Files

HEPs may include auxiliary files such as diagrams.  Such files must be
named ``hep-XXXX-Y.ext``, where "XXXX" is the HEP number, "Y" is a
serial number (starting at 1), and "ext" is replaced by the actual
file extension (e.g. "png").


1. http://www.python.org/dev/peps/pep-0001

2. http://www.apache.org/foundation/bylaws.html

3. http://www.apache.org/foundation/voting.html


This document has been placed in the public domain.

View raw message