hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Collins <...@cloudera.com>
Subject Re: HEP proposal
Date Wed, 14 Jul 2010 19:56:07 GMT
On Wed, Jul 14, 2010 at 12:12 PM, Konstantin Boudnik <cos@yahoo-inc.com> wrote:
> I have been following this discussion for some time now and the only question
> came to my mind: why mimicking PEP? Is it so astonishingly successful or is
> it much better than Apache voting or RFC process (from where it has been
> apparently derived).

I thought it would be more fruitful to adapt an existing, working
model rather than invent a new one. I based it on PEP after looking at
what other projects used; PEP seems to strike a balance between no
structure and more heavy weight processes (JSR, RFC).  I'm open to
other models if there's something you think is more suitable.

> So far I see HEP as an over-complicated process for a process sake. I'd appreciate if
some one can chip-in and tell me if and where I'm wrong.

I think that's a reasonable opinion. There are communities that
function without process.

In the limited time I've been working on Hadoop there's been tension
between substantially modifying the system and preserving it's
stability. It seemed to me that some additional structure around
discussing change up front would help relieve some of that tension.
There is value in thinking and discussing what you're doing up front
in a way that's visible to the community. HEP essentially enforces
that. If that happens naturally I agree we don't need the process. If
people think what we've currently got is sufficient I'm happy to chalk
it up as a learning experience, don't want people to adopt it just
because I've taken the time to draft something.


> Thanks,
>  Cos
> On Wed, Jul 14, 2010 at 10:46AM, Eli Collins wrote:
>>    Hey Konstantin,
>>    Thanks for taking a look, comments in-line.
>>    On Tue, Jul 13, 2010 at 1:54 PM, Konstantin Shvachko <shv@yahoo-inc.com>
>>    wrote:
>>    > Eli,
>>    >
>>    > Thanks for a really good proposal.
>>    > Some questions / comments:
>>    >
>>    > On voting
>>    > 1. Which voting rule?
>>    > http://www.apache.org/foundation/glossary.html#ConsensusApproval
>>    > http://www.apache.org/foundation/glossary.html#MajorityApproval
>>    > I think you mean the MajorityApproval as it does not have veto rule.
>>    > So may be it's just clarifying the reference.
>>    Good point, clarified so it's majority approval.
>>    > 2. Who can vote?
>>    > Usually PMCs have Binding Votes.
>>    > Would be good to have a sentence clarifying this.
>>    Yup, added.
>>    > 3. How long does the vote go?
>>    > Usual 3 days may not be enough. One week is reasonable?
>>    Specified one week.
>>    > 4. Discussion on public lists.
>>    > A HEP can evolve from a jira, then it should be counted as a public
>>    > discussion. I think it makes sense even to continue the discussion
>>    > there if so.
>>    Agreed, changed the wording to "If the scope of the idea is limited to
>>    a specific project the discussion may happen on the project-specific
>>    list or jira."
>>    > 5. How the set of editors is selected?
>>    >   "The editors are apointed and removed by the PMC informally, similar
>>    to
>>    >   how the Apache Board appoints shepherds to projects."
>>    > This needs a reference. How does Apache Board appoints shepherds?
>>    Good question, anyone know? Since it's informal I imagine shepherds
>>    volunteer. The editors could be a subset of the PMC that either
>>    volunteers or is rotated periodically.
>>    > 6. The level of design details.
>>    > I think HEP should have a pretty detailed design. When people vote they
>>    > will want to be sure the design can lead to a reasonable implementation.
>>    > Should we say "implementation-ready design", rather than
>>    > "A high-level explanation of the design."
>>    > Or just
>>    > "A _detailed_ explanation of the design."
>>    Rewrote this section, tried to make it more explicit about giving both
>>    a high-level view and complete enough description so the design can
>>    lead to a reasonable implementation. Also added that this section
>>    should cover how to test the design.
>>    > 7. Typos:
>>    > successuflly, apointed, intial
>>    Fixed.
>>    Updated draft follows.
>>    Thanks,
>>    Eli
>>    HEP: 1
>>    Title: HEP Purpose and Guidelines
>>    Author: Eli Collins
>>    Status: Draft
>>    What is a HEP?
>>    ==============
>>    HEP stands for Hadoop Enhancement Proposal, and is based on Python's
>>    PEP (Python Enhancement Proposal) [1].  A HEP is a document that
>>    describes a new feature, it's rationale, and issues the feature needs
>>    to address in order to be successfully incorporated.
>>    The intent is for HEPs to be the primary mechanism for proposing
>>    significant new features to core Hadoop (common, HDFS and MapReduce),
>>    incorporating community feedback, and recording the proposal.  Going
>>    through the HEP process should improve the chances that a proposal is
>>    successful.
>>    While HEPs do not need to come with code, they are a mechanism to
>>    propose features to the community, with the intent of contributing the
>>    feature, rather than request the community implement a feature.
>>    HEPs must be consistent with Apache bylaws [2], for example, the HEP
>>    workflow takes place on the public Apache Hadoop lists.
>>    When is a HEP Required?
>>    =======================
>>    HEPs should not impede casual contribution to Hadoop.  Small
>>    improvements and bugs do not require HEPs.  Not all features need
>>    HEPs.  While the decision is subjective, here are some guidelines to
>>    indicate a HEP should be considered:
>>    - The feature impacts backwards compatibility (eg modifies released
>>    public APIs in an incompatible way).
>>    - The feature requires that an existing component be substantially
>>    re-designed (eg NameNode modified to use Bookkeeper).
>>    - The implementation impact multiple parts of the system (eg symbolic
>>    links versus adding a pluggable component like a codec).
>>    - The feature impacts the entire development community (eg converts
>>    the build system to use maven).
>>    HEP Workflow
>>    ============
>>    The author of a HEP should first try to determine if their idea is
>>    HEP-able by sending mail to the general list.  If the scope of the
>>    idea is limited to a specific project the discussion may happen on the
>>    project-specific list or jira.  This gives the author a chance to
>>    flesh out the proposal, address initial concerns, and figure out
>>    whether it has a chance of being accepted.  The author's role is to
>>    build consensus, and gather dissenting opinions.
>>    Following this discussion the author should draft a HEP proposal
>>    following the HEP template. The proposal should accurately reflect and
>>    address feedback and dissenting opinions.  For example, flesh out
>>    sections on backwards compatibility or testing. The author should send
>>    the draft of the proposal to hep@hadoop.apache.org for review.  This
>>    is a new, public list for editors and those interested in following
>>    the review process.
>>    A set of editors reviews incoming HEPs. Each HEP is assigned a single
>>    primary editor. An editor may volunteer if they feel particular
>>    functional expertise is required or assign HEPs to editors round
>>    robin.
>>    The editor reviews the proposal and may request it be updated if it
>>    does not sufficiently address feedback raised during discussion, eg
>>    why the proposal is not redundant with existing functionality, or is
>>    technically sound, sufficiently motivated, covers backwards
>>    compatibility, etc. As updates are necessary, the HEP author can check
>>    in new versions if they have commit permissions, or can email new HEP
>>    versions to the editor for committing. In order to ensure HEP
>>    proposals make progress the editor should respond to proposal drafts
>>    within two weeks of receiving them (or the proposer can request
>>    another editor), and the proposer should generate updates to the draft
>>    within two weeks of receiving feedback from the editor.
>>    The editor's role is to determine if the proposal is complete, so that
>>    the proposal can be voted on, not whether they agree with the proposal
>>    itself.  The editor's involvement should increase the chance that a
>>    HEP proposal makes it to a vote.
>>    Once the editor deems the proposal is complete they add it to a
>>    versioned HEP repository and the author posts the proposal to
>>    general@hadoop.apache.org for vote.  HEP votes, like Apache procedural
>>    votes, use majority approval [3]. Only PMC members have binding votes.
>>    Votes are open for a period of 1 week to allow all active voters time
>>    to consider the proposal. Successful HEPs are assigned a number,
>>    unsuccessful HEPs remain drafts.
>>    The editors are appointed and removed by the PMC informally, similar
>>    to how the Apache Board appoints shepherds to projects.
>>    HEP Contents
>>    ============
>>    Each HEP should contain the following:
>>    1. Preamble -- Including the HEP number, a short descriptive title,
>>    and the names of the authors.
>>    2. Abstract -- A short (~200 word) description of the technical issue
>>    being addressed.
>>    3. Copyright/public domain -- Each HEP must either be explicitly
>>    labelled as placed in the public domain (see this HEP as an example).
>>    4. Design -- This section should give both a high-level view and a
>>    complete description of the feature.  While the design does not need
>>    to cover implementation detail it should be clear to the reader that
>>    the design can lead to a reasonable implementation.  This section
>>    should cover intended use cases, failure scenarios, strategies for
>>    testing, and impact on the existing system.
>>    5. Motivation -- The motivation spells out the use case for the
>>    feature and the benefits it provides.
>>    6. Rationale -- The rationale describes what motivated the design and
>>    why particular design decisions were made.  It should describe
>>    alternate designs that were considered and related work, e.g. how the
>>    feature is designed in other systems. It should also consider whether
>>    the feature could be achieved by layering atop the existing system
>>    rather than modifying it.
>>    The rationale should provide evidence of consensus within the
>>    community and discuss important objections or concerns raised during
>>    discussion.
>>    7. Backwards Compatibility -- All HEPs that introduce backwards
>>    incompatibilities must include a section describing these
>>    incompatibilities and their severity.  The HEP must explain how the
>>    author proposes to deal with these incompatibilities.  HEP submissions
>>    without a sufficient backwards compatibility treatise may be rejected
>>    outright.
>>    HEP Template
>>    ============
>>    HEPs should be plain text with minimal structural markup that adheres
>>    to a rigid style.  You can use this HEP as an example. Each HEP starts
>>    with a header that contains the HEP number (or empty if the number has
>>    not yet been assigned), title, list of authors and status (Draft,
>>    Accepted, Rejected, or Withdrawn).
>>    Auxiliary Files
>>    ===============
>>    HEPs may include auxiliary files such as diagrams.  Such files must be
>>    named ``hep-XXXX-Y.ext``, where "XXXX" is the HEP number, "Y" is a
>>    serial number (starting at 1), and "ext" is replaced by the actual
>>    file extension (e.g. "png").
>>    References
>>    ==========
>>    1. http://www.python.org/dev/peps/pep-0001
>>    2. http://www.apache.org/foundation/bylaws.html
>>    3. http://www.apache.org/foundation/glossary.html#MajorityApproval
>>    Copyright
>>    =========
>>    This document has been placed in the public domain.

View raw message