hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Compatibility" by ArpitAgarwal
Date Tue, 29 Jul 2014 21:24:03 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Compatibility" page has been changed by ArpitAgarwal:

  #format wiki
  #language en
- = This page is out of date. Please refer to http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html
henceforth. Thanks =
+ '''Contents moved to [[http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html|http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html]]'''
- == Apache Hadoop Compability ==
- The goal of this page is to describe the issues that affect compatibility between Hadoop
releases for Hadoop developers, downstream projects and end users.
- Here are some existing relevant jiras and pages related to the topic
-  1. Describe the annotations an interface should have as per our existing interface classification
scheme (see [[https://issues.apache.org/jira/browse/HADOOP-5073|HADOOP-5073]])
-  2. Cover compatibility items that are beyond the scope of API classification,  along the
lines of those discussed in [[https://issues.apache.org/jira/browse/HADOOP-5071|HADOOP-5071]],
focused on Hadoop v1.
-  3. The [[Roadmap]] captures release policies, some of the content is out of date.
- ''Note to downstream projects/users'': If you are concerned about compatibility at any level,
we strongly encourage you follow the Hadoop developer mailing lists, and track on JIRA issues
that may concern you. You are also strongly advised to verify that your code works against
beta releases of forthcoming Hadoop versions, as that is a time in which identified regressions
can be corrected rapidly - if you only test when a new final release ships, the time to fix
is likely to be at least three months.
- === Compatibility types ===
- This section describes the various types of compatibility.
- ==== Java API  ====
- Hadoop interfaces and classes are annotated to describe the intended audience and stability
in order to maintain compatibility with previous releases. See HADOOP-5073 for more details.
-  * InterfaceAudience: captures the intended audience, possible values are Public (for outside
users), LimitedPrivate (for other Hadoop components, and closely related projects like HBase),
Private (for within component use)
-  * InterfaceStability: describes what types of interface changes are expected. Possible
values are Stable, Evolving, Unstable, and Deprecated. See HADOOP-5073 for details.
- ===== Usecases =====
-  * Public-Stable API compatibility is required to ensure end-user programs and downstream
projects continue to work without any changes.
-  * LimitedPrivate-Stable API compatibility is required to allow upgrade of individual components
across minor releases.
-  * Private-Stable API compatibility is required for rolling upgrades.
- ==== Semantics compatibility ====
- Apache Hadoop strives to ensure that the behavior of APIs remains consistent over versions,
though changes for correctness may result in changes in behavior. That is: if you relied on
something which we consider to be a bug, it may get fixed.
- We are in the process of specifying some APIs more rigorously, enhancing
- our test suites to verify compliance with the specification, effectively
- creating a formal specification for the subset of behaviors that can be
- easily tested. We welcome involvement in this process, from both users and
- implementors of our APIs.
- ==== Wire compatibility ====
- Wire compatibility concerns the data being transmitted over the wire between components.
Hadoop uses protocol buffers for most RPC communication. Preserving compatibility requires
prohibiting modification to the required fields of the corresponding protocol buffer. Optional
fields may be added without breaking backwards compatibility. The protocols can be categorized
as follows:
-  * Client-Server: communication between Hadoop clients and servers (e.g. the HDFS client
to NameNode protocol, or the YARN client to ResourceManager protocol).
-  * Client-Server (Admin): It’s worth distinguishing a subset of the Client-Server protocols
used solely by administrative commands (eg the HA admin protocol) as these protocols may be
changed with less impact than general Client-Server protocols.
-  * Server-Server: communication between servers (e.g. the protocol between the DataNode
and NameNode, or NodeManager and ResourceManager)
- Non-RPC communication should be considered as well, for example using HTTP to transfer an
HDFS image as part of snapshotting or transferring MapTask output.
- ==== Metrics/ JMX ====
- While the Metrics API compatibility is governed by Java API compatibility, the actual metrics
exposed by Hadoop need to be compatible for users to be able to automate using them (scripts
etc.). Adding additional metrics is compatible; modifying (eg changing the unit or measurement)
or removing existing metrics breaks compatibility. Likewise, changes to JMX MBean object names
also break compatibility.
- ==== REST APIs ====
- REST API compatibility corresponds to both the request (URLs) and responses to each request
(content, which may contain other URLs). Hadoop REST APIs are specifically meant for stable
use by clients. The following are the exposed REST APIs:
-  * WebHDFS (as supported by HttpFs) - Stable 
-  * WebHDFS (as supported by HDFS) - Stable
-  * NodeManager
-  * ResourceManager
-  * MR JobHistoryServer
-  * Servlets - JMX, conf
- ==== CLI Commands ====
- Users and admins use Command Line Interface commands either directly or via scripts to access/modify
data and run jobs/apps. Changing the path of a command, removing or renaming command line
options, the order of arguments, or the command return code and output may break compatibility
and adversely affect users.
- ==== Directory Structure ====
- Userlogs, job history and output are stored on disk - local or on HDFS. Changing the directory
structure of these user-accessible files break compatibility, even in cases where the original
path is preserved via symbolic links (if, for example, the path is accessed by a servlet that
is configured to not follow links).
- ==== Classpath ====
- User applications (e.g. Java programs which are not MR jobs) built against Hadoop might
add all Hadoop jars (including Hadoop’s dependencies) to the application’s classpath.
Adding new dependencies or updating the version of existing dependencies may break user programs.
- ==== Environment Variables ====
- Users and related projects often utilize the exported environment variables (eg HADOOP_CONF_DIR),
therefore removing or renaming environment variables is an incompatible change. 
- ==== Hadoop Configuration Files ====
- Modification to Hadoop configuration properties, both key names and units of values. We
assume users, who use Hadoop configuration objects to pass information to jobs, ensure their
properties do not conflict with the key-prefixes defined by Hadoop. The following key-prefixes
are used by Hadoop daemons and should be avoided:
-  * hadoop.*
-  * io.*
-  * ipc.*
-  * fs.*
-  * net.*
-  * file.*
-  * ftp.*
-  * s3.*
-  * kfs.*
-  * ha.*
-  * file.*
-  * dfs.*
-  * mapred.*
-  * mapreduce.*
-  * yarn.*
- ==== Data Formats ====
- Hadoop uses particular formats to store data and metadata. Modifying these formats can interfere
with rolling upgrades and hence require compatibility guarantees. For instance, modifying
the IFile format will require re-execution of jobs in-flight during a rolling upgrade. Preserving
certain formats like HDFS meta data allow access/modification of data across releases.

View raw message