Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Message-ID: <1998765138.1232140319732.JavaMail.jira@brutus>
Date: Fri, 16 Jan 2009 13:11:59 -0800 (PST)
From: "Sanjay Radia (JIRA)" <jira@apache.org>
To: core-dev@hadoop.apache.org
Subject: [jira] Updated: (HADOOP-5073) Hadoop 1.0 Interface Classification -
 scope (visibility - public/private) and stability
In-Reply-To: <1274929875.1232139599612.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HADOOP-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sanjay Radia updated HADOOP-5073:
---------------------------------

    Description: 
This jira proposes an interface classification for hadoop interfaces.
The discussion was started in email alias core-dev@hadoop.apache.org in Nov 2008.


  was:
This jira proposes an interface classification for hadoop interfaces.
The discussion was started in email alias core-dev@hadoop.apache.org in Nov 2008.

h2. Interface Taxonomy - Scope & Stability Classification

The interface taxonomy  classification provided here is for guidance to developers and users of interfaces.
The classification guides a developer to declare the scope (or targeted audience or users) of an interface and also its stability.
* *Benefits to the user of an interface*: Knows which interfaces to use or not use and their stability.
* *Benefits to the developer*: to prevent accidental changes of interfaces and hence accidental impact on users or other components or system. This is particularly useful in large systems with many developers who may not all have a shared state/history of the project.

This classification was derived from  a taxonomy used inside Yahoo and 
from the OpenSolaris taxonomy (http://www.opensolaris.org/os/community/arc/policies/interface-taxonomy/#Advice)

Interface have two main attributes: *Scope* and *Stability*
* *Scope* -  _denotes  the potential customers of the interface_.
   For example many interfaces are merely internal or private interfaces of the implementation while others are public or external interfaces that applications or clients are expected to use. In posix, libc is an is an external or public interface, while large parts of the kernel are internal or private interfaces. In addition, some  interfaces are targeted to some specific other subsystems. Identifying the scope helps define the customers or users of the interfaces and helps define the impact of breaking an interface. For example we may be willing to break the comaptibility of an interface whose scope is a small number of specific subsystems. One the other hand, one is unlikely to break a protocol interfaces that millions of internet users depend on.
  The following are useful scopes in order of increasing/wider visibility
**  *project-private*
***  the interface is for internal use _within_ the project and should not be used by applications. It is subject to change at anytime without notice. Most interfaces of a project are project private.
**  *limited-private*
***  the interface is used by a specified set of projects or systems (typically closely related projects). Other projects or systems should not use the interface. Changes to the interface will be communicated/negotiated with the specified projects. For example, in the hadoop project, some interfaces are *hdfs-mapReduce-private* in that they are private to the hdfs and mapReduce projects.
**  *company-private* (*_This not applicable to opensource projects such as Hadoop._* It is mentioned here for completeness.)
***  the interface can use used by other projects within a company. 
**  *public* 
***  the interface is for general use by any application.

* *Stability* -  _denotes when changes can be made to the interface that break compatibility_.
**  *Stable*
***  Can evolve while retaining compatibility for minor release boundaries.; can break compatibility only at major release (ie. at m.0).
**  *Evolving*
***  Evolving, but can break compatibility at minor release (i.e.  m.x)
**  *Unstable*
***  This usually makes sense for only private interfaces. 
***  However one may call this out for a _supposedly_ public interface to highlight that it should not be used as an interface; for public interfaces, labeling it as *Not-an-interface* is probably more appropriate than "unstable".
**** Examples of publically visible interfaces that are unstable (ie not-an-interface): GUI, CLIs whose output format will change
**  *Deprecated* - should not be used, will be removed in the future.


h2. FAQ
# What is the harm in applications using a private interface that is stable? How is it different than a public stable interface?
   While a private interface marked as stable is targeted to change only at major releases, it may break at other times if the providers of that interface are willing to changes the internal users of that interface. Further, a public stable interface is less likely to break even at major releases (even though it is allowed to break compatibility) because the impact of the change is larger. *If you use a private interface (regardless of its stability) you run the risk of incompatibility*.
# Why bother declaring the stability of a private interface? 
**  To communicate the intent to its internal users.
**  To provide guidelines to developers of the interface
**  The stability may capture other internal properties of the system
***  e.g In HDFS,  NN-DN protocol stability can help implement as rolling upgrades
***  e.g. In HDFS, FSImage stabilty can help provide more flexible roll backs.
# How will the classification be recorded for hadoop APIs?
** Each interface or class will have the scope and stability recorded using javadoc tags, annotation, or some other mechanim. What ever mechanism we choose, the classification must be visisble on the genrated java doc.
** APIs of private scope will not be part of the "public javadoc generated by ant (ie by the _ant target_ "javadoc"); they will only be generated for the developer javadoc (generated by _ant target_ "javadoc-dev")
** One can derive the scope of java classes and java interfaces by the scope of the package in which they are contained. Hence it is useful to declare the scope of each java package as public or private (along with the private scope variations).


h2. Proposed Classification for Hadoop Interfaces

* Scope Public
**  Stable
***  FileSystem, MapReduce, Config, CLI (inlcuding output), parts of Mapred.lib, Job Logs API, instrumentation metrics. Audit logs
**  Evolving
***  TFile, parts of Mapred.lib, some instrumentation metrics, jmx interface (till it becomes stable), 
***  Job logs and job history ( Some tools, scripts and chukwa use this to analyze job processing)
**  Not An interface
***  Web GUI
* Scope Private
**  Limited-Private  Evolving
***  RPC, Metrics (HDFS-MapReduce Private) - once stable, we can consider making these public-stable.
**  Project-Private Stable
***  Intra-HDFS and MR protocols (facilitates rolling upgrades down the road)
***  FSImage 
**** Note this will enable old versions of HDFS to read newer fsImage and hence enable more flexible roll backs.
**** Q. Should this be Project-Private Evolving instead?
**** Regardless of the stability of FSImage, new versions of HDFS have to be able to transparently convert older versions and provide roll-back.
**  Project-Private Evolving
***  DFSClient (Q. should this be "project-private unstable"
**  Project-Private Unstable 
***  System logs
***  All implementation classes and interfaces not otherwise classified are considered to be project-private stable.


h2. Interface Taxonomy - Scope & Stability Classification

The interface taxonomy  classification provided here is for guidance to developers and users of interfaces.
The classification guides a developer to declare the scope (or targeted audience or users) of an interface and also its stability.
* *Benefits to the user of an interface*: Knows which interfaces to use or not use and their stability.
* *Benefits to the developer*: to prevent accidental changes of interfaces and hence accidental impact on users or other components or system. This is particularly useful in large systems with many developers who may not all have a shared state/history of the project.

This classification was derived from  a taxonomy used inside Yahoo and 
from the OpenSolaris taxonomy (http://www.opensolaris.org/os/community/arc/policies/interface-taxonomy/#Advice)

Interface have two main attributes: *Scope* and *Stability*
* *Scope* -  _denotes  the potential customers of the interface_.
   For example many interfaces are merely internal or private interfaces of the implementation while others are public or external interfaces that applications or clients are expected to use. In posix, libc is an is an external or public interface, while large parts of the kernel are internal or private interfaces. In addition, some  interfaces are targeted to some specific other subsystems. Identifying the scope helps define the customers or users of the interfaces and helps define the impact of breaking an interface. For example we may be willing to break the comaptibility of an interface whose scope is a small number of specific subsystems. One the other hand, one is unlikely to break a protocol interfaces that millions of internet users depend on.
  The following are useful scopes in order of increasing/wider visibility
**  *project-private*
***  the interface is for internal use _within_ the project and should not be used by applications. It is subject to change at anytime without notice. Most interfaces of a project are project private.
**  *limited-private*
***  the interface is used by a specified set of projects or systems (typically closely related projects). Other projects or systems should not use the interface. Changes to the interface will be communicated/negotiated with the specified projects. For example, in the hadoop project, some interfaces are *hdfs-mapReduce-private* in that they are private to the hdfs and mapReduce projects.
**  *company-private* (*_This not applicable to opensource projects such as Hadoop._* It is mentioned here for completeness.)
***  the interface can use used by other projects within a company. 
**  *public* 
***  the interface is for general use by any application.

* *Stability* -  _denotes when changes can be made to the interface that break compatibility_.
**  *Stable*
***  Can evolve while retaining compatibility for minor release boundaries.; can break compatibility only at major release (ie. at m.0).
**  *Evolving*
***  Evolving, but can break compatibility at minor release (i.e.  m.x)
**  *Unstable*
***  This usually makes sense for only private interfaces. 
***  However one may call this out for a _supposedly_ public interface to highlight that it should not be used as an interface; for public interfaces, labeling it as *Not-an-interface* is probably more appropriate than "unstable".
**** Examples of publically visible interfaces that are unstable (ie not-an-interface): GUI, CLIs whose output format will change
**  *Deprecated* - should not be used, will be removed in the future.


h2. FAQ
# What is the harm in applications using a private interface that is stable? How is it different than a public stable interface?
   While a private interface marked as stable is targeted to change only at major releases, it may break at other times if the providers of that interface are willing to changes the internal users of that interface. Further, a public stable interface is less likely to break even at major releases (even though it is allowed to break compatibility) because the impact of the change is larger. *If you use a private interface (regardless of its stability) you run the risk of incompatibility*.
# Why bother declaring the stability of a private interface? 
**  To communicate the intent to its internal users.
**  To provide guidelines to developers of the interface
**  The stability may capture other internal properties of the system
***  e.g In HDFS,  NN-DN protocol stability can help implement as rolling upgrades
***  e.g. In HDFS, FSImage stabilty can help provide more flexible roll backs.
# How will the classification be recorded for hadoop APIs?
** Each interface or class will have the scope and stability recorded using javadoc tags, annotation, or some other mechanim. What ever mechanism we choose, the classification must be visisble on the genrated java doc.
** APIs of private scope will not be part of the "public javadoc generated by ant (ie by the _ant target_ "javadoc"); they will only be generated for the developer javadoc (generated by _ant target_ "javadoc-dev")
** One can derive the scope of java classes and java interfaces by the scope of the package in which they are contained. Hence it is useful to declare the scope of each java package as public or private (along with the private scope variations).


h2. Proposed Classification for Hadoop Interfaces

* Scope Public
**  Stable
***  FileSystem, MapReduce, Config, CLI (inlcuding output), parts of Mapred.lib, Job Logs API, instrumentation metrics. Audit logs
**  Evolving
***  TFile, parts of Mapred.lib, some instrumentation metrics, jmx interface (till it becomes stable), 
***  Job logs and job history ( Some tools, scripts and chukwa use this to analyze job processing)
**  Not An interface
***  Web GUI
* Scope Private
**  Limited-Private  Evolving
***  RPC, Metrics (HDFS-MapReduce Private) - once stable, we can consider making these public-stable.
**  Project-Private Stable
***  Intra-HDFS and MR protocols (facilitates rolling upgrades down the road)
***  FSImage 
**** Note this will enable old versions of HDFS to read newer fsImage and hence enable more flexible roll backs.
**** Q. Should this be Project-Private Evolving instead?
**** Regardless of the stability of FSImage, new versions of HDFS have to be able to transparently convert older versions and provide roll-back.
**  Project-Private Evolving
***  DFSClient (Q. should this be "project-private unstable"
**  Project-Private Unstable 
***  System logs
***  All implementation classes and interfaces not otherwise classified are considered to be project-private stable.


> Hadoop 1.0 Interface Classification - scope (visibility - public/private) and stability
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5073
>             Project: Hadoop Core
>          Issue Type: Sub-task
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>
> This jira proposes an interface classification for hadoop interfaces.
> The discussion was started in email alias core-dev@hadoop.apache.org in Nov 2008.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.