hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-5073) Hadoop 1.0 Interface Classification - scope (visibility - public/private) and stability
Date Fri, 16 Jan 2009 21:11:59 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sanjay Radia updated HADOOP-5073:
---------------------------------

    Description: 
This jira proposes an interface classification for hadoop interfaces.
The discussion was started in email alias core-dev@hadoop.apache.org in Nov 2008.



  was:
This jira proposes an interface classification for hadoop interfaces.
The discussion was started in email alias core-dev@hadoop.apache.org in Nov 2008.

h2. Interface Taxonomy - Scope & Stability Classification

The interface taxonomy  classification provided here is for guidance to developers and users
of interfaces.
The classification guides a developer to declare the scope (or targeted audience or users)
of an interface and also its stability.
* *Benefits to the user of an interface*: Knows which interfaces to use or not use and their
stability.
* *Benefits to the developer*: to prevent accidental changes of interfaces and hence accidental
impact on users or other components or system. This is particularly useful in large systems
with many developers who may not all have a shared state/history of the project.

This classification was derived from  a taxonomy used inside Yahoo and 
from the OpenSolaris taxonomy (http://www.opensolaris.org/os/community/arc/policies/interface-taxonomy/#Advice)

Interface have two main attributes: *Scope* and *Stability*
* *Scope* -  _denotes  the potential customers of the interface_.
   For example many interfaces are merely internal or private interfaces of the implementation
while others are public or external interfaces that applications or clients are expected to
use. In posix, libc is an is an external or public interface, while large parts of the kernel
are internal or private interfaces. In addition, some  interfaces are targeted to some specific
other subsystems. Identifying the scope helps define the customers or users of the interfaces
and helps define the impact of breaking an interface. For example we may be willing to break
the comaptibility of an interface whose scope is a small number of specific subsystems. One
the other hand, one is unlikely to break a protocol interfaces that millions of internet users
depend on.
  The following are useful scopes in order of increasing/wider visibility
**  *project-private*
***  the interface is for internal use _within_ the project and should not be used by applications.
It is subject to change at anytime without notice. Most interfaces of a project are project
private.
**  *limited-private*
***  the interface is used by a specified set of projects or systems (typically closely related
projects). Other projects or systems should not use the interface. Changes to the interface
will be communicated/negotiated with the specified projects. For example, in the hadoop project,
some interfaces are *hdfs-mapReduce-private* in that they are private to the hdfs and mapReduce
projects.
**  *company-private* (*_This not applicable to opensource projects such as Hadoop._* It is
mentioned here for completeness.)
***  the interface can use used by other projects within a company. 
**  *public* 
***  the interface is for general use by any application.

* *Stability* -  _denotes when changes can be made to the interface that break compatibility_.
**  *Stable*
***  Can evolve while retaining compatibility for minor release boundaries.; can break compatibility
only at major release (ie. at m.0).
**  *Evolving*
***  Evolving, but can break compatibility at minor release (i.e.  m.x)
**  *Unstable*
***  This usually makes sense for only private interfaces. 
***  However one may call this out for a _supposedly_ public interface to highlight that it
should not be used as an interface; for public interfaces, labeling it as *Not-an-interface*
is probably more appropriate than "unstable".
**** Examples of publically visible interfaces that are unstable (ie not-an-interface): GUI,
CLIs whose output format will change
**  *Deprecated* - should not be used, will be removed in the future.


h2. FAQ
# What is the harm in applications using a private interface that is stable? How is it different
than a public stable interface?
   While a private interface marked as stable is targeted to change only at major releases,
it may break at other times if the providers of that interface are willing to changes the
internal users of that interface. Further, a public stable interface is less likely to break
even at major releases (even though it is allowed to break compatibility) because the impact
of the change is larger. *If you use a private interface (regardless of its stability) you
run the risk of incompatibility*.
# Why bother declaring the stability of a private interface? 
**  To communicate the intent to its internal users.
**  To provide guidelines to developers of the interface
**  The stability may capture other internal properties of the system
***  e.g In HDFS,  NN-DN protocol stability can help implement as rolling upgrades
***  e.g. In HDFS, FSImage stabilty can help provide more flexible roll backs.
# How will the classification be recorded for hadoop APIs?
** Each interface or class will have the scope and stability recorded using javadoc tags,
annotation, or some other mechanim. What ever mechanism we choose, the classification must
be visisble on the genrated java doc.
** APIs of private scope will not be part of the "public javadoc generated by ant (ie by the
_ant target_ "javadoc"); they will only be generated for the developer javadoc (generated
by _ant target_ "javadoc-dev")
** One can derive the scope of java classes and java interfaces by the scope of the package
in which they are contained. Hence it is useful to declare the scope of each java package
as public or private (along with the private scope variations).


h2. Proposed Classification for Hadoop Interfaces

* Scope Public
**  Stable
***  FileSystem, MapReduce, Config, CLI (inlcuding output), parts of Mapred.lib, Job Logs
API, instrumentation metrics. Audit logs
**  Evolving
***  TFile, parts of Mapred.lib, some instrumentation metrics, jmx interface (till it becomes
stable), 
***  Job logs and job history ( Some tools, scripts and chukwa use this to analyze job processing)
**  Not An interface
***  Web GUI
* Scope Private
**  Limited-Private  Evolving
***  RPC, Metrics (HDFS-MapReduce Private) - once stable, we can consider making these public-stable.
**  Project-Private Stable
***  Intra-HDFS and MR protocols (facilitates rolling upgrades down the road)
***  FSImage 
**** Note this will enable old versions of HDFS to read newer fsImage and hence enable more
flexible roll backs.
**** Q. Should this be Project-Private Evolving instead?
**** Regardless of the stability of FSImage, new versions of HDFS have to be able to transparently
convert older versions and provide roll-back.
**  Project-Private Evolving
***  DFSClient (Q. should this be "project-private unstable"
**  Project-Private Unstable 
***  System logs
***  All implementation classes and interfaces not otherwise classified are considered to
be project-private stable.




h2. Interface Taxonomy - Scope & Stability Classification

The interface taxonomy  classification provided here is for guidance to developers and users
of interfaces.
The classification guides a developer to declare the scope (or targeted audience or users)
of an interface and also its stability.
* *Benefits to the user of an interface*: Knows which interfaces to use or not use and their
stability.
* *Benefits to the developer*: to prevent accidental changes of interfaces and hence accidental
impact on users or other components or system. This is particularly useful in large systems
with many developers who may not all have a shared state/history of the project.

This classification was derived from  a taxonomy used inside Yahoo and 
from the OpenSolaris taxonomy (http://www.opensolaris.org/os/community/arc/policies/interface-taxonomy/#Advice)

Interface have two main attributes: *Scope* and *Stability*
* *Scope* -  _denotes  the potential customers of the interface_.
   For example many interfaces are merely internal or private interfaces of the implementation
while others are public or external interfaces that applications or clients are expected to
use. In posix, libc is an is an external or public interface, while large parts of the kernel
are internal or private interfaces. In addition, some  interfaces are targeted to some specific
other subsystems. Identifying the scope helps define the customers or users of the interfaces
and helps define the impact of breaking an interface. For example we may be willing to break
the comaptibility of an interface whose scope is a small number of specific subsystems. One
the other hand, one is unlikely to break a protocol interfaces that millions of internet users
depend on.
  The following are useful scopes in order of increasing/wider visibility
**  *project-private*
***  the interface is for internal use _within_ the project and should not be used by applications.
It is subject to change at anytime without notice. Most interfaces of a project are project
private.
**  *limited-private*
***  the interface is used by a specified set of projects or systems (typically closely related
projects). Other projects or systems should not use the interface. Changes to the interface
will be communicated/negotiated with the specified projects. For example, in the hadoop project,
some interfaces are *hdfs-mapReduce-private* in that they are private to the hdfs and mapReduce
projects.
**  *company-private* (*_This not applicable to opensource projects such as Hadoop._* It is
mentioned here for completeness.)
***  the interface can use used by other projects within a company. 
**  *public* 
***  the interface is for general use by any application.

* *Stability* -  _denotes when changes can be made to the interface that break compatibility_.
**  *Stable*
***  Can evolve while retaining compatibility for minor release boundaries.; can break compatibility
only at major release (ie. at m.0).
**  *Evolving*
***  Evolving, but can break compatibility at minor release (i.e.  m.x)
**  *Unstable*
***  This usually makes sense for only private interfaces. 
***  However one may call this out for a _supposedly_ public interface to highlight that it
should not be used as an interface; for public interfaces, labeling it as *Not-an-interface*
is probably more appropriate than "unstable".
**** Examples of publically visible interfaces that are unstable (ie not-an-interface): GUI,
CLIs whose output format will change
**  *Deprecated* - should not be used, will be removed in the future.


h2. FAQ
# What is the harm in applications using a private interface that is stable? How is it different
than a public stable interface?
   While a private interface marked as stable is targeted to change only at major releases,
it may break at other times if the providers of that interface are willing to changes the
internal users of that interface. Further, a public stable interface is less likely to break
even at major releases (even though it is allowed to break compatibility) because the impact
of the change is larger. *If you use a private interface (regardless of its stability) you
run the risk of incompatibility*.
# Why bother declaring the stability of a private interface? 
**  To communicate the intent to its internal users.
**  To provide guidelines to developers of the interface
**  The stability may capture other internal properties of the system
***  e.g In HDFS,  NN-DN protocol stability can help implement as rolling upgrades
***  e.g. In HDFS, FSImage stabilty can help provide more flexible roll backs.
# How will the classification be recorded for hadoop APIs?
** Each interface or class will have the scope and stability recorded using javadoc tags,
annotation, or some other mechanim. What ever mechanism we choose, the classification must
be visisble on the genrated java doc.
** APIs of private scope will not be part of the "public javadoc generated by ant (ie by the
_ant target_ "javadoc"); they will only be generated for the developer javadoc (generated
by _ant target_ "javadoc-dev")
** One can derive the scope of java classes and java interfaces by the scope of the package
in which they are contained. Hence it is useful to declare the scope of each java package
as public or private (along with the private scope variations).


h2. Proposed Classification for Hadoop Interfaces

* Scope Public
**  Stable
***  FileSystem, MapReduce, Config, CLI (inlcuding output), parts of Mapred.lib, Job Logs
API, instrumentation metrics. Audit logs
**  Evolving
***  TFile, parts of Mapred.lib, some instrumentation metrics, jmx interface (till it becomes
stable), 
***  Job logs and job history ( Some tools, scripts and chukwa use this to analyze job processing)
**  Not An interface
***  Web GUI
* Scope Private
**  Limited-Private  Evolving
***  RPC, Metrics (HDFS-MapReduce Private) - once stable, we can consider making these public-stable.
**  Project-Private Stable
***  Intra-HDFS and MR protocols (facilitates rolling upgrades down the road)
***  FSImage 
**** Note this will enable old versions of HDFS to read newer fsImage and hence enable more
flexible roll backs.
**** Q. Should this be Project-Private Evolving instead?
**** Regardless of the stability of FSImage, new versions of HDFS have to be able to transparently
convert older versions and provide roll-back.
**  Project-Private Evolving
***  DFSClient (Q. should this be "project-private unstable"
**  Project-Private Unstable 
***  System logs
***  All implementation classes and interfaces not otherwise classified are considered to
be project-private stable.



> Hadoop 1.0 Interface Classification - scope (visibility - public/private) and stability
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5073
>             Project: Hadoop Core
>          Issue Type: Sub-task
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>
> This jira proposes an interface classification for hadoop interfaces.
> The discussion was started in email alias core-dev@hadoop.apache.org in Nov 2008.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message