hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/HBaseWireCompatibility" by JimmyXiang
Date Mon, 13 Feb 2012 20:05:49 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/HBaseWireCompatibility" page has been changed by JimmyXiang:

New page:
 * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-Glossary|Glossary]]
 * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-MotivationandGoals|Motivation
and Goals]]
 * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-Requirements|Requirements]]
 * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-Design|Design]]
  * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-Wireformat|Wire
  * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-RPC|RPC]]
  * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-Interfaces|Interfaces]]
 * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-Phasing|Phasing]]
  * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-Phase0:HBASE4403:SeparateexistingAPIsintopublicandprivateinterfaces|Phase
0: HBASE-4403: Separate existing APIs into public and private interfaces]]
  * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-Phase1:CompatibilitybetweenclientapplicationsandHBaseclusters|Phase
1: Compatibility between client applications and HBase clusters]]
  * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-Phase2:HBaseclusterrollingupgradewithinsamemajorversion|Phase
2: HBase cluster rolling upgrade within same major version]]
 * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-Openquestions|Open
 * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-Appendix|Appendix]]
  * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-Futurework(outofscopeofthisdocument)|Future
work (out of scope of this document)]]
 * [[https://wiki.cloudera.com/display/engineering/HBase+wire+compatibility+plan#HBasewirecompatibilityplan-References|References]]

=== Glossary ===
||<class="confluenceTd">Major version||<class="confluenceTd">First number in the
version, to the left of the period.  e.g. in version 2.3, the major version is "2"||
||<class="confluenceTd">Minor version||<class="confluenceTd">Second number in
the version, immediately to the right of the period.  e.g. in version 2.3, the minor version
is "3"||
||<class="confluenceTd">Compatibility window||<class="confluenceTd">Range of consecutive
major versions where compatibility between two entities is guaranteed||

=== Motivation and Goals ===
The current lack of a concrete versioning story for HBase is limiting  from both an operational
and development perspective.  We propose a  "first-pass" versioning story (that can be expanded
upon later) that  addresses the following use cases and concerns:


 *   * '''Decouple client applications from HBase''':  HBase clients are  part of a separate
application and often administrated separately from  the HBase cluster. Today, the application
and cluster must be upgraded  in lockstep.  Clients should interoperate with HBase RS's and
masters  that are running different major versions.  This allows for the  following operational
   * Multiple pods: HBase clients may write to multiple HBase clusters  / pods (sharded clusters)
and the shards may be upgraded separately.
   * Application-level replication: HBase installation with active and  standby clusters should
be able to upgrade, and HBase clients can work  with both.
  * '''No downtime for minor version upgrades'''


 *   * '''Simplified support for bugfixes, upgrades, and testing''' -  no need for specialized
migration scripts
  * '''Higher developer cadence in the community''' - can add functionality and not worry
about breaking version compatibility

=== Requirements ===
 * HBase server-server running different '''minor''' versions shall interoperate in an extensible
 * HBase client-server running different '''major''' versions shall interoperate in an extensible
  * For example, in a scenario where client is running with version A  and server is running
with version B: anything the other side does not  understand is ignored, provided defaults
for, or otherwise handled in an  appropriate manner.
 * Formats and protocols shall be extensible to allow for new functionality such as RPC tracing.
 * Developers shall be able to augment RPC protocol with '''new''' methods within minor and
major version upgrades.
 * Critical path operations (Get/Put) performance shall suffer no more  than 10% from the
current 0.92 version's performance on YCSB load tests  (i.e. read/update/scan/insert should
individually be no more than 10%  slower).

=== Design ===
===== Wire format =====
Protobuf vs. Thrift vs. Avro

We propose to use protobuf for wire format. The primary reason is  that the current HBase
RPC engine (see HADOOP-7379) supports  protobuf-encoded data, and protobuf is relatively more
stable than the  alternatives.  In addition, Hadoop RPC uses protobuf, and the community 
may eventually want Hadoop and HBase to share the same RPC.

We also propose to change the HBase RPC connection header from  Writable to  protobuf so that
the HBase RPC is programming language  agnostic.

===== RPC =====
Currently, the HBase RPC engine does not support async IO or protocol  negotiation.  These
features don't impact compatibility and therefore  can evolve separately and are not in scope
for this document.

===== Interfaces =====
 1. Client talks to ZK to find out the location of the master and the root region server.
 1. Client applications talk to RS using '''HRegionInterface''' to read from/write to/scan
a table, etc..
 1. Client applications talk to master using '''HMasterInterface''' to dynamically create
a table, add a column family, and so on.
 1. Master talks to RS using '''HRegionInterface''' to open/close/move/split/flush regions,
and so on.
 1. Master puts data in ZK to store the active master and root region  server location, create
log splitting tasks, track RS's status, and so  on.
 1. RS reads data in ZK to track log splitting tasks and update it to  grab a task and report
status, create a node for the RS so that master  can track the status of this RS, track master
location  and cluster  status, and so on.
 1. RS talks to master using '''HMasterRegionInterface''' to report RS load, RS fatal errors,
RS starts-up.
 1. Occasionally, RS talks to root region or meta region with '''HRegionInterface''' to check
the status of a region, create new daughter regions in region splitting, and so on.

=== Phasing ===
The order of phases is based on priority. They can be done in parallel if there are enough

===== Phase 0: HBASE-4403: Separate existing APIs into public and private interfaces =====
In order to define which APIs can be changed, we need to separate existing APIs into public
and private.

===== Phase 1: Compatibility between client applications and HBase clusters =====
 To make HBase client applications work properly with HBase clusters of different major and
minor versions.

Note: deal with 1, 2, 3 (we get 8 "for free") in the interface graph.


 * Replace RPC negotiation with extensible PB-based types
 * Replace root and master address znodes in ZK with PB-enabled types  (goal: client's ZK
interactions become extensible) (1 in the graph)
 * Replace existing HRegionInterface calls for read from/write to/scan  a table...  with PB-enabled
types (goal: client->RS and RS->RS  RPC becomes extensible) (2 in the graph)
 * Replace existing HMasterInterface calls with PB-enabled types (goal: client->master
RPC becomes extensible) (3 in the graph)
 * Replace data stored in .META. and -ROOT- tables with PB-enabled  types (goal: client can
read from old and/or new .META. and -ROOT-  tables) (2 in the graph)

===== Phase 2: HBase cluster rolling upgrade within same major version =====
 To make an HBase cluster able to roll upgrade within the same major version

Note: deal with 4, 5, 6, 7 in the interface graph.


 * Replace existing HRegionInterface calls for  open/close/move/split/flush regions... with
PB-enabled types (goal:  master->RS RPC becomes extensible) (4 in the graph)
 * Replace Writables used in ZK for communication between RS and  master with PB-enabled types
(goal: RS and master ZK interactions become  extensible) (5, 6 in the graph)
 * Replace existing HMasterRegionInterface calls with PB-enabled types  (goal: RS->master
RPC becomes extensible) (7 in the graph)
 * Add version information to each server's ZK data (master and RS's)  (goal: tracking live
version numbers, used for automatic wire-off of new  features in persistent data formats until
all servers have hit new  version) (5, 6 in the graph)
 * Add version information to RS's on master status UI

=== Open questions ===
 - How does ZK security and HBase RPC security play into this -- (should be orthogonal, but
please make this clearer).
 - Should pluggable encodings (thrift/avro/pb/writable) be in scope?
 - Should async IO servers and clients be in scope or not?

 - What is the policy for existing versions (89, 90,  92, 94) -- do we  support them or require
on major upgrade before they get this story?
 - Developers should be able to remove deprecated methods or arguments to  maintain flexibility,
but can't do that within the compatibility  window.  What should be our compatibility window?
2 years (roughly 4  major versions)?
 - What is the ZK version interoperability story?
 - Should architectural-level changes require a major version bump?

=== Appendix ===
===== Future work (out of scope of this document) =====
 * Possible to extend RPC with meta-data that can enable new functionality like RPC tracing
 * Unify this with Hadoop RPC
 * Online rolling upgrade of single cluster between major versions:  Today, major version
upgrades of a single cluster require downtime to  upgrade all services in lockstep, while
some minor versions updates can  be upgraded via the rolling-restart script.  HBase should
remain  available through this process.
 * Partial rollout: HBase clusters should allow for some nodes to  "try" a newer version for
testing purposes.  Today, this is a manual  process and possible only within minor versions.
(likely possible, would  like to not exclude this possibility).
 * Cluster configuration changes: HBase should remain available as  configuration changes
(hbase-site.xml) or hotfixes are applied. Today,  rolling-restart script can be used to perform
this operation.
 * Replication across different versions
 * Disaster recovery: Operators should be able to smoke test a new  version during the rolling
upgrade before turning on the new features  for general use. If anything is wrong during the
rolling upgrade, it  should be able to roll back.
 * ZK wire compatibility: is necessary for RPCs between different  versions of HBase and ZK.
 Currently ZK supports backward compatibility  for one version only. Different versions of
HBase could support  different ZK versions.
 * HDFS wire compatibility
 * Data format changes may prevent minor or major version roll-back.
 * Security RPC data compression/encryption changes may prevent minor or major version roll-back
 * Persistent Data is stored in version specific formats in HDFS (xml    configs, regioninfo,
tableinfo).  Some of these data encodings and    formats are directly exposed; for example,
ZK is not exposed as an API.

=== References ===
Dapper: http://research.google.com/pubs/pub36356.html
 Cross version upgrade and compatibility: https://issues.apache.org/jira/browse/HBASE-5305
 Redo IPC/RPC: https://issues.apache.org/jira/browse/HBASE-2182
 HDFS wire compatibility: [[https://issues.apache.org/jira/browse/HADOOP-7347|HADOOP-7347]]
 HDFS client wire compatibility: [[https://issues.apache.org/jira/browse/HDFS-2060|HDFS-2060]]
 HDFS data protocol wire compatibility: [[https://issues.apache.org/jira/browse/HDFS-2058|HDFS-2058]]
 Use protobuf objects in existing IPC: [[https://issues.apache.org/jira/browse/HADOOP-7379|HADOOP-7379]]

View raw message