hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From els...@apache.org
Subject [1/2] hbase git commit: HBASE-16574 Book updates for backup and restore
Date Mon, 20 Nov 2017 18:16:43 GMT
Repository: hbase
Updated Branches:
  refs/heads/branch-2 eb17a2f28 -> 086a03797
  refs/heads/master 9b7b83d86 -> 8f806ab48


HBASE-16574 Book updates for backup and restore

Signed-off-by: Josh Elser <elserj@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/8f806ab4
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/8f806ab4
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/8f806ab4

Branch: refs/heads/master
Commit: 8f806ab48643b16a975691bf6edf7887706327f1
Parents: 9b7b83d
Author: Frank Welsch <fwelsch@jps.net>
Authored: Fri Sep 23 18:00:42 2016 -0400
Committer: Josh Elser <elserj@apache.org>
Committed: Mon Nov 20 13:12:00 2017 -0500

----------------------------------------------------------------------
 src/main/asciidoc/_chapters/backup_restore.adoc | 912 +++++++++++++++++++
 src/main/asciidoc/book.adoc                     |   5 +-
 .../resources/images/backup-app-components.png  | Bin 0 -> 24366 bytes
 .../resources/images/backup-cloud-appliance.png | Bin 0 -> 30114 bytes
 .../images/backup-dedicated-cluster.png         | Bin 0 -> 24950 bytes
 .../resources/images/backup-intra-cluster.png   | Bin 0 -> 19348 bytes
 6 files changed, 914 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/8f806ab4/src/main/asciidoc/_chapters/backup_restore.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/backup_restore.adoc b/src/main/asciidoc/_chapters/backup_restore.adoc
new file mode 100644
index 0000000..a9dbcf5
--- /dev/null
+++ b/src/main/asciidoc/_chapters/backup_restore.adoc
@@ -0,0 +1,912 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[casestudies]]
+= Backup and Restore
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+[[br.overview]]
+== Overview
+
+Backup and restore is a standard operation provided by many databases. An effective backup
and restore
+strategy helps ensure that users can recover data in case of unexpected failures. The HBase
backup and restore
+feature helps ensure that enterprises using HBase as a canonical data repository can recover
from catastrophic
+failures. Another important feature is the ability to restore the database to a particular
+point-in-time, commonly referred to as a snapshot.
+
+The HBase backup and restore feature provides the ability to create full backups and incremental
backups on
+tables in an HBase cluster. The full backup is the foundation on which incremental backups
are applied
+to build iterative snapshots. Incremental backups can be run on a schedule to capture changes
over time,
+for example by using a Cron task. Incremental backups are more cost-effective than full backups
because they only capture
+the changes since the last backup and they also enable administrators to restore the database
to any prior incremental backup. Furthermore, the
+utilities also enable table-level data backup-and-recovery if you do not want to restore
the entire dataset
+of the backup.
+
+The backup and restore feature supplements the HBase Replication feature. While HBase replication
is ideal for
+creating "hot" copies of the data (where the replicated data is immediately available for
query), the backup and
+restore feature is ideal for creating "cold" copies of data (where a manual step must be
taken to restore the system).
+Previously, users only had the ability to create full backups via the ExportSnapshot functionality.
The incremental
+backup implementation is the novel improvement over the previous "art" provided by ExportSnapshot.
+
+[[br.terminology]]
+== Terminology
+
+The backup and restore feature introduces new terminology which can be used to understand
how control flows through the
+system.
+
+* _A backup_: A logical unit of data and metadata which can restore a table to its state
at a specific point in time.
+* _Full backup_: a type of backup which wholly encapsulates the contents of the table at
a point in time.
+* _Incremental backup_: a type of backup which contains the changes in a table since a full
backup.
+* _Backup set_: A user-defined name which references one or more tables over which a backup
can be executed.
+* _Backup ID_: A unique names which identifies one backup from the rest, e.g. `backupId_1467823988425`
+
+[[br.planning]]
+== Planning
+
+There are some common strategies which can be used to implement backup and restore in your
environment. The following section
+shows how these strategies are implemented and identifies potential tradeoffs with each.
+
+WARNING: This backup and restore tools has not been tested on Transparent Data Encryption
(TDE) enabled HDFS clusters.
+This is related to the open issue link:https://issues.apache.org/jira/browse/HBASE-16178[HBASE-16178].
+
+[[br.intracluster.backup]]
+=== Backup within a cluster
+
+This strategy stores the backups on the same cluster as where the backup was taken. This
approach is only appropriate for testing
+as it does not provide any additional safety on top of what the software itself already provides.
+
+.Intra-Cluster Backup
+image::backup-intra-cluster.png[]
+
+[[br.dedicated.cluster.backup]]
+=== Backup using a dedicated cluster
+
+This strategy provides greater fault tolerance and provides a path towards disaster recovery.
In this setting, you will
+store the backup on a separate HDFS cluster by supplying the backup destination cluster’s
HDFS URL to the backup utility.
+You should consider backing up to a different physical location, such as a different data
center.
+
+Typically, a backup-dedicated HDFS cluster uses a more economical hardware profile to save
money.
+
+.Dedicated HDFS Cluster Backup
+image::backup-dedicated-cluster.png[]
+
+[[br.cloud.or.vendor.backup]]
+=== Backup to the Cloud or a storage vendor appliance
+
+Another approach to safeguarding HBase incremental backups is to store the data on provisioned,
secure servers that belong
+to third-party vendors and that are located off-site. The vendor can be a public cloud provider
or a storage vendor who uses
+a Hadoop-compatible file system, such as S3 and other HDFS-compatible destinations.
+
+.Backup to Cloud or Vendor Storage Solutions
+image::backup-cloud-appliance.png[]
+
+NOTE: The HBase backup utility does not support backup to multiple destinations. A workaround
is to manually create copies
+of the backup files from HDFS or S3.
+
+[[br.initial.setup]]
+== First-time configuration steps
+
+This section contains the necessary configuration changes that must be made in order to use
the backup and restore feature.
+As this feature makes significant use of YARN's MapReduce framework to parallelize these
I/O heavy operations, configuration
+changes extend outside of just `hbase-site.xml`.
+
+=== Allow the "hbase" system user in YARN
+
+The YARN *container-executor.cfg* configuration file must have the following property setting:
_allowed.system.users=hbase_. No spaces
+are allowed in entries of this configuration file.
+
+WARNING: Skipping this step will result in runtime errors when executing the first backup
tasks.
+
+*Example of a valid container-executor.cfg file for backup and restore:*
+
+[source]
+----
+yarn.nodemanager.log-dirs=/var/log/hadoop/mapred
+yarn.nodemanager.linux-container-executor.group=yarn
+banned.users=hdfs,yarn,mapred,bin
+allowed.system.users=hbase
+min.user.id=500
+----
+
+=== HBase specific changes
+
+Add the following properties to hbase-site.xml and restart HBase if it is already running.
+
+NOTE: The ",..." is an ellipsis meant to imply that this is a comma-separated list of values,
not literal text which should be added to hbase-site.xml.
+
+[source]
+----
+<property>
+  <name>hbase.backup.enable</name>
+  <value>true</value>
+</property>
+<property>
+  <name>hbase.master.logcleaner.plugins</name>
+  <value>org.apache.hadoop.hbase.backup.master.BackupLogCleaner,...</value>
+</property>
+<property>
+  <name>hbase.procedure.master.classes</name>
+  <value>org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager,...</value>
+</property>
+<property>
+  <name>hbase.procedure.regionserver.classes</name>
+  <value>org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager,...</value>
+</property>
+<property>
+  <name>hbase.coprocessor.region.classes</name>
+  <value>org.apache.hadoop.hbase.backup.BackupObserver,...</value>
+</property>
+<property>
+  <name>hbase.master.hfilecleaner.plugins</name>
+  <value>org.apache.hadoop.hbase.backup.BackupHFileCleaner,...</value>
+</property>
+----
+
+== Backup and Restore commands
+
+This covers the command-line utilities that administrators would run to create, restore,
and merge backups. Tools to
+inspect details on specific backup sessions is covered in the next section, <<br.administration,Administration
of Backup Images>>.
+
+Run the command `hbase backup help <command>` to access the online help that provides
basic information about a command
+and its options. The below information is captured in this help message for each command.
+
+// hbase backup create
+
+[[br.creating.complete.backup]]
+### Creating a Backup Image
+
+[NOTE]
+====
+For HBase clusters also using Apache Phoenix: include the SQL system catalog tables in the
backup. In the event that you
+need to restore the HBase backup, access to the system catalog tables enable you to resume
Phoenix interoperability with the
+restored data.
+====
+
+The first step in running the backup and restore utilities is to perform a full backup and
to store the data in a separate image
+from the source. At a minimum, you must do this to get a baseline before you can rely on
incremental backups.
+
+Run the following command as HBase superuser:
+
+[source]
+----
+hbase backup create <type> <backup_path>
+----
+
+After the command finishes running, the console prints a SUCCESS or FAILURE status message.
The SUCCESS message includes a _backup_ ID.
+The backup ID is the Unix time (also known as Epoch time) that the HBase master received
the backup request from the client.
+
+[TIP]
+====
+Record the backup ID that appears at the end of a successful backup. In case the source cluster
fails and you need to recover the
+dataset with a restore operation, having the backup ID readily available can save time.
+====
+
+[[br.create.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_type_::
+  The type of backup to execute: _full_ or _incremental_. As a reminder, an _incremental_
backup requires a _full_ backup to
+  already exist.
+
+_backup_path_::
+  The _backup_path_ argument specifies the full filesystem URI of where to store the backup
image. Valid prefixes are
+  are _hdfs:_, _webhdfs:_, _gpfs:_, and _s3fs:_.
+
+[[br.create.named.cli.arguments]]
+#### Named Command-Line Arguments
+
+_-t <table_name[,table_name]>_::
+  A comma-separated list of tables to back up. If no tables are specified, all tables are
backed up. No regular-expression or
+  wildcard support is present; all table names must be explicitly listed. See <<br.using.backup.sets,Backup
Sets>> for more
+  information about peforming operations on collections of tables. Mutually exclusive with
the _-s_ option; one of these
+  named options are required.
+
+_-s <backup_set_name>_::
+  Identify tables to backup based on a backup set. See <<br.using.backup.sets,Using
Backup Sets>> for the purpose and usage
+  of backup sets. Mutually exclusive with the _-t_ option.
+
+_-w <number_workers>_::
+  (Optional) Specifies the number of parallel workers to copy data to backup destination.
Backups are currently executed by MapReduce jobs
+  so this value corresponds to the number of Mappers that will be spawned by the job.
+
+_-b <bandwidth_per_worker>_::
+  (Optional) Specifies the bandwidth of each worker in MB per second.
+
+_-d_::
+  (Optional) Enables "DEBUG" mode which prints additional logging about the backup creation.
+
+_-q <name>_::
+  (Optional) Allows specification of the name of a YARN queue which the MapReduce job to
create the backup should be executed in. This option
+  is useful to prevent backup tasks from stealing resources away from other MapReduce jobs
of high importance.
+
+[[br.usage.examples]]
+#### Example usage
+
+[source]
+----
+$ hbase backup create full hdfs://host5:8020/data/backup -t SALES2,SALES3 -w 3
+----
+
+This command creates a full backup image of two tables, SALES2 and SALES3, in the HDFS instance
who NameNode is host5:8020
+in the path _/data/backup_. The _-w_ option specifies that no more than three parallel works
complete the operation.
+
+// hbase backup restore
+
+[[br.restoring.backup]]
+### Restoring a Backup Image
+
+Run the following command as an HBase superuser. You can only restore a backup on a running
HBase cluster because the data must be
+redistributed the RegionServers for the operation to complete successfully.
+
+[source]
+----
+hbase restore <backup_path> <backup_id>
+----
+
+[[br.restore.positional.args]]
+#### Positional Command-Line Arguments
+
+_backup_path_::
+  The _backup_path_ argument specifies the full filesystem URI of where to store the backup
image. Valid prefixes are
+  are _hdfs:_, _webhdfs:_, _gpfs:_, and _s3fs:_.
+
+_backup_id_::
+  The backup ID that uniquely identifies the backup image to be restored.
+
+
+[[br.restore.named.args]]
+#### Named Command-Line Arguments
+
+_-t <table_name[,table_name]>_::
+  A comma-separated list of tables to restore. See <<br.using.backup.sets,Backup Sets>>
for more
+  information about peforming operations on collections of tables. Mutually exclusive with
the _-s_ option; one of these
+  named options are required.
+
+_-s <backup_set_name>_::
+  Identify tables to backup based on a backup set. See <<br.using.backup.sets,Using
Backup Sets>> for the purpose and usage
+  of backup sets. Mutually exclusive with the _-t_ option.
+
+_-q <name>_::
+  (Optional) Allows specification of the name of a YARN queue which the MapReduce job to
create the backup should be executed in. This option
+  is useful to prevent backup tasks from stealing resources away from other MapReduce jobs
of high importance.
+
+_-c_::
+  (Optional) Perform a dry-run of the restore. The actions are checked, but not executed.
+
+_-m <target_tables>_::
+  (Optional) A comma-separated list of tables to restore into. If this option is not provided,
the original table name is used. When
+  this option is provided, there must be an equal number of entries provided in the `-t`
option.
+
+_-o_::
+  (Optional) Overwrites the target table for the restore if the table already exists.
+
+
+[[br.restore.usage]]
+#### Example of Usage
+
+[source]
+----
+hbase backup restore /tmp/backup_incremental backupId_1467823988425 -t mytable1,mytable2
+----
+
+This command restores two tables of an incremental backup image. In this example:
+• `/tmp/backup_incremental` is the path to the directory containing the backup image.
+• `backupId_1467823988425` is the backup ID.
+• `mytable1` and `mytable2` are the names of tables in the backup image to be restored.
+
+// hbase backup merge
+
+[[br.merge.backup]]
+### Merging Incremental Backup Images
+
+This command can be used to merge two or more incremental backup images into a single incremental
+backup image. This can be used to consolidate multiple, small incremental backup images into
a single
+larger incremental backup image. This command could be used to merge hourly incremental backups
+into a daily incremental backup image, or daily incremental backups into a weekly incremental
backup.
+
+[source]
+----
+$ hbase backup merge <backup_ids>
+----
+
+[[br.merge.backup.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_backup_ids_::
+  A comma-separated list of incremental backup image IDs that are to be combined into a single
image.
+
+[[br.merge.backup.named.cli.arguments]]
+#### Named Command-Line Arguments
+
+None.
+
+[[br.merge.backup.example]]
+#### Example usage
+
+[source]
+----
+$ hbase backup merge backupId_1467823988425,backupId_1467827588425
+----
+
+// hbase backup set
+
+[[br.using.backup.sets]]
+### Using Backup Sets
+
+Backup sets can ease the administration of HBase data backups and restores by reducing the
amount of repetitive input
+of table names. You can group tables into a named backup set with the `hbase backup set add`
command. You can then use
+the -set option to invoke the name of a backup set in the `hbase backup create` or `hbase
backup restore` rather than list
+individually every table in the group. You can have multiple backup sets.
+
+NOTE: Note the differentiation between the `hbase backup set add` command and the _-set_
option. The `hbase backup set add`
+command must be run before using the `-set` option in a different command because backup
sets must be named and defined
+before using backup sets as a shortcut.
+
+If you run the `hbase backup set add` command and specify a backup set name that does not
yet exist on your system, a new set
+is created. If you run the command with the name of an existing backup set name, then the
tables that you specify are added
+to the set.
+
+In this command, the backup set name is case-sensitive.
+
+NOTE: The metadata of backup sets are stored within HBase. If you do not have access to the
original HBase cluster with the
+backup set metadata, then you must specify individual table names to restore the data.
+
+To create a backup set, run the following command as the HBase superuser:
+
+[source]
+----
+$ hbase backup set <subcommand> <backup_set_name> <tables>
+----
+
+[[br.set.subcommands]]
+#### Backup Set Subcommands
+
+The following list details subcommands of the hbase backup set command.
+
+NOTE: You must enter one (and no more than one) of the following subcommands after hbase
backup set to complete an operation.
+Also, the backup set name is case-sensitive in the command-line utility.
+
+_add_::
+  Adds table[s] to a backup set. Specify a _backup_set_name_ value after this argument to
create a backup set.
+
+_remove_::
+  Removes tables from the set. Specify the tables to remove in the tables argument.
+
+_list_::
+  Lists all backup sets.
+
+_describe_::
+  Displays a description of a backup set. The information includes whether the set has full
+  or incremental backups, start and end times of the backups, and a list of the tables in
the set. This subcommand must precede
+  a valid value for the _backup_set_name_ value.
+
+_delete_::
+  Deletes a backup set. Enter the value for the _backup_set_name_ option directly after the
`hbase backup set delete` command.
+
+[[br.set.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_backup_set_name_::
+  Use to assign or invoke a backup set name. The backup set name must contain only printable
characters and cannot have any spaces.
+
+_tables_::
+  List of tables (or a single table) to include in the backup set. Enter the table names
as a comma-separated list. If no tables
+  are specified, all tables are included in the set.
+
+TIP: Maintain a log or other record of the case-sensitive backup set names and the corresponding
tables in each set on a separate
+or remote cluster, backup strategy. This information can help you in case of failure on the
primary cluster.
+
+[[br.set.usage]]
+#### Example of Usage
+
+[source]
+----
+$ hbase backup set add Q1Data TEAM3,TEAM_4
+----
+
+Depending on the environment, this command results in _one_ of the following actions:
+
+* If the `Q1Data` backup set does not exist, a backup set containing tables `TEAM_3` and
`TEAM_4` is created.
+* If the `Q1Data` backup set exists already, the tables `TEAM_3` and `TEAM_4` are added to
the `Q1Data` backup set.
+
+[[br.administration]]
+## Administration of Backup Images
+
+The `hbase backup` command has several subcommands that help with administering backup images
as they accumulate. Most production
+environments require recurring backups, so it is necessary to have utilities to help manage
the data of the backup repository.
+Some subcommands enable you to find information that can help identify backups that are relevant
in a search for particular data.
+You can also delete backup images.
+
+The following list details each `hbase backup subcommand` that can help administer backups.
Run the full command-subcommand line as
+the HBase superuser.
+
+// hbase backup progress
+
+[[br.managing.backup.progress]]
+### Managing Backup Progress
+
+You can monitor a running backup in another terminal session by running the _hbase backup
progress_ command and specifying the backup ID as an argument.
+
+For example, run the following command as hbase superuser to view the progress of a backup
+
+[source]
+----
+$ hbase backup progress <backup_id>
+----
+
+[[br.progress.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_backup_id_::
+  Specifies the backup that you want to monitor by seeing the progress information. The backupId
is case-sensitive.
+
+[[br.progress.named.cli.arguments]]
+#### Named Command-Line Arguments
+
+None.
+
+[[br.progress.example]]
+#### Example usage
+
+[source]
+----
+hbase backup progress backupId_1467823988425
+----
+
+// hbase backup history
+
+[[br.managing.backup.history]]
+### Managing Backup History
+
+This command displays a log of backup sessions. The information for each session includes
backup ID, type (full or incremental), the tables
+in the backup, status, and start and end time. Specify the number of backup sessions to display
with the optional -n argument.
+
+[source]
+----
+$ hbase backup history <backup_id>
+----
+
+[[br.history.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_backup_id_::
+  Specifies the backup that you want to monitor by seeing the progress information. The backupId
is case-sensitive.
+
+[[br.history.named.cli.arguments]]
+#### Named Command-Line Arguments
+
+_-n <num_records>_::
+  (Optional) The maximum number of backup records (Default: 10).
+
+_-p <backup_root_path>_::
+  The full filesystem URI of where backup images are stored.
+
+_-s <backup_set_name>_::
+  The name of the backup set to obtain history for. Mutually exclusive with the _-t_ option.
+
+_-t_ <table_name>::
+  The name of table to obtain history for. Mutually exclusive with the _-s_ option.
+
+[[br.history.backup.example]]
+#### Example usage
+
+[source]
+----
+$ hbase backup history
+$ hbase backup history -n 20
+$ hbase backup history -t WebIndexRecords
+----
+
+// hbase backup describe
+
+[[br.describe.backup]]
+### Describing a Backup Image
+
+This command can be used to obtain information about a specific backup image.
+
+[source]
+----
+$ hbase backup describe <backup_id>
+----
+
+[[br.describe.backup.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_backup_id_::
+  The ID of the backup image to describe.
+
+[[br.describe.backup.named.cli.arguments]]
+#### Named Command-Line Arguments
+
+None.
+
+[[br.describe.backup.example]]
+#### Example usage
+
+[source]
+----
+$ hbase backup describe backupId_1467823988425
+----
+
+// hbase backup delete
+
+[[br.delete.backup]]
+### Deleting a Backup Image
+
+This command can be used to delete a backup image which is no longer needed.
+
+[source]
+----
+$ hbase backup delete <backup_id>
+----
+
+[[br.delete.backup.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+_backup_id_::
+  The ID to the backup image which should be deleted.
+
+[[br.delete.backup.named.cli.arguments]]
+#### Named Command-Line Arguments
+
+None.
+
+[[br.delete.backup.example]]
+#### Example usage
+
+[source]
+----
+$ hbase backup delete backupId_1467823988425
+----
+
+// hbase backup repair
+
+[[br.repair.backup]]
+### Backup Repair Command
+
+This command attempts to correct any inconsistencies in persisted backup metadata which exists
as
+the result of software errors or unhandled failure scenarios. While the backup implementation
tries
+to correct all errors on its own, this tool may be necessary in the cases where the system
cannot
+automatically recover on its own.
+
+[source]
+----
+$ hbase backup repair
+----
+
+[[br.repair.backup.positional.cli.arguments]]
+#### Positional Command-Line Arguments
+
+None.
+
+[[br.repair.backup.named.cli.arguments]]
+### Named Command-Line Arguments
+
+None.
+
+[[br.repair.backup.example]]
+#### Example usage
+
+[source]
+----
+$ hbase backup repair
+----
+
+[[br.backup.configuration]]
+## Configuration keys
+
+The backup and restore feature includes both required and optional configuration keys.
+
+### Required properties
+
+_hbase.backup.enable_: Controls whether or not the feature is enabled (Default: `false`).
Set this value to `true`.
+
+_hbase.master.logcleaner.plugins_: A comma-separated list of classes invoked when cleaning
logs in the HBase Master. Set
+this value to `org.apache.hadoop.hbase.backup.master.BackupLogCleaner` or append it to the
current value.
+
+_hbase.procedure.master.classes_: A comma-separated list of classes invoked with the Procedure
framework in the Master. Set
+this value to `org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager` or append
it to the current value.
+
+_hbase.procedure.regionserver.classes_: A comma-separated list of classes invoked with the
Procedure framework in the RegionServer.
+Set this value to `org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager`
or append it to the current value.
+
+_hbase.coprocessor.region.classes_: A comma-separated list of RegionObservers deployed on
tables. Set this value to
+`org.apache.hadoop.hbase.backup.BackupObserver` or append it to the current value.
+
+_hbase.master.hfilecleaner.plugins_: A comma-separated list of HFileCleaners deployed on
the Master. Set this value
+to `org.apache.hadoop.hbase.backup.BackupHFileCleaner` or append it to the current value.
+
+### Optional properties
+
+_hbase.backup.system.ttl_: The time-to-live in seconds of data in the `hbase:backup` tables
(default: forever). This property
+is only relevant prior to the creation of the `hbase:backup` table. Use the `alter` command
in the HBase shell to modify the TTL
+when this table already exists. See the <<br.filesystem.growth.warning,below section>>
for more details on the impact of this
+configuration property.
+
+_hbase.backup.attempts.max_: The number of attempts to perform when taking hbase table snapshots
(default: 10).
+
+_hbase.backup.attempts.pause.ms_: The amount of time to wait between failed snapshot attempts
in milliseconds (default: 10000).
+
+_hbase.backup.logroll.timeout.millis_: The amount of time (in milliseconds) to wait for RegionServers
to execute a WAL rolling
+in the Master's procedure framework (default: 30000).
+
+[[br.best.practices]]
+## Best Practices
+
+### Formulate a restore strategy and test it.
+
+Before you rely on a backup and restore strategy for your production environment, identify
how backups must be performed,
+and more importantly, how restores must be performed. Test the plan to ensure that it is
workable.
+At a minimum, store backup data from a production cluster on a different cluster or server.
To further safeguard the data,
+use a backup location that is at a different physical location.
+
+If you have a unrecoverable loss of data on your primary production cluster as a result of
computer system issues, you may
+be able to restore the data from a different cluster or server at the same site. However,
a disaster that destroys the whole
+site renders locally stored backups useless. Consider storing the backup data and necessary
resources (both computing capacity
+and operator expertise) to restore the data at a site sufficiently remote from the production
site. In the case of a catastrophe
+at the whole primary site (fire, earthquake, etc.), the remote backup site can be very valuable.
+
+### Secure a full backup image first.
+
+As a baseline, you must complete a full backup of HBase data at least once before you can
rely on incremental backups. The full
+backup should be stored outside of the source cluster. To ensure complete dataset recovery,
you must run the restore utility
+with the option to restore baseline full backup. The full backup is the foundation of your
dataset. Incremental backup data
+is applied on top of the full backup during the restore operation to return you to the point
in time when backup was last taken.
+
+### Define and use backup sets for groups of tables that are logical subsets of the entire
dataset.
+
+You can group tables into an object called a backup set. A backup set can save time when
you have a particular group of tables
+that you expect to repeatedly back up or restore.
+
+When you create a backup set, you type table names to include in the group. The backup set
includes not only groups of related
+tables, but also retains the HBase backup metadata. Afterwards, you can invoke the backup
set name to indicate what tables apply
+to the command execution instead of entering all the table names individually.
+
+### Document the backup and restore strategy, and ideally log information about each backup.
+
+Document the whole process so that the knowledge base can transfer to new administrators
after employee turnover. As an extra
+safety precaution, also log the calendar date, time, and other relevant details about the
data of each backup. This metadata
+can potentially help locate a particular dataset in case of source cluster failure or primary
site disaster. Maintain duplicate
+copies of all documentation: one copy at the production cluster site and another at the backup
location or wherever it can be
+accessed by an administrator remotely from the production cluster.
+
+[[br.s3.backup.scenario]]
+## Scenario: Safeguarding Application Datasets on Amazon S3
+
+This scenario describes how a hypothetical retail business uses backups to safeguard application
data and then restore the dataset
+after failure.
+
+The HBase administration team uses backup sets to store data from a group of tables that
have interrelated information for an
+application called green. In this example, one table contains transaction records and the
other contains customer details. The
+two tables need to be backed up and be recoverable as a group.
+
+The admin team also wants to ensure daily backups occur automatically.
+
+.Tables Composing The Backup Set
+image::backup-app-components.png[]
+
+The following is an outline of the steps and examples of commands that are used to backup
the data for the _green_ application and
+to recover the data later. All commands are run when logged in as HBase superuser.
+
+1. A backup set called _green_set_ is created as an alias for both the transactions table
and the customer table. The backup set can
+be used for all operations to avoid typing each table name. The backup set name is case-sensitive
and should be formed with only
+printable characters and without spaces.
+
+[source]
+----
+$ hbase backup set add green_set transactions
+$ hbase backup set add green_set customer
+----
+
+2. The first backup of green_set data must be a full backup. The following command example
shows how credentials are passed to Amazon
+S3 and specifies the file system with the s3a: prefix.
+
+[source]
+----
+$ ACCESS_KEY=ABCDEFGHIJKLMNOPQRST
+$ SECRET_KEY=123456789abcdefghijklmnopqrstuvwxyzABCD
+$ sudo -u hbase hbase backup create full\
+  s3a://$ACCESS_KEY:SECRET_KEY@prodhbasebackups/backups -s green_set
+----
+
+3. Incremental backups should be run according to a schedule that ensures essential data
recovery in the event of a catastrophe. At
+this retail company, the HBase admin team decides that automated daily backups secures the
data sufficiently. The team decides that
+they can implement this by modifying an existing Cron job that is defined in `/etc/crontab`.
Consequently, IT modifies the Cron job
+by adding the following line:
+
+[source]
+----
+@daily hbase hbase backup create incremental s3a://$ACCESS_KEY:$SECRET_KEY@prodhbasebackups/backups
-s green_set
+----
+
+4. A catastrophic IT incident disables the production cluster that the green application
uses. An HBase system administrator of the
+backup cluster must restore the _green_set_ dataset to the point in time closest to the recovery
objective.
+
+NOTE: If the administrator of the backup HBase cluster has the backup ID with relevant details
in accessible records, the following
+search with the `hdfs dfs -ls` command and manually scanning the backup ID list can be bypassed.
Consider continuously maintaining
+and protecting a detailed log of backup IDs outside the production cluster in your environment.
+
+The HBase administrator runs the following command on the directory where backups are stored
to print the list of successful backup
+IDs on the console:
+
+`hdfs dfs -ls -t /prodhbasebackups/backups`
+
+5. The admin scans the list to see which backup was created at a date and time closest to
the recovery objective. To do this, the
+admin converts the calendar timestamp of the recovery point in time to Unix time because
backup IDs are uniquely identified with
+Unix time. The backup IDs are listed in reverse chronological order, meaning the most recent
successful backup appears first.
+
+The admin notices that the following line in the command output corresponds with the _green_set_
backup that needs to be restored:
+
+`/prodhbasebackups/backups/backup_1467823988425`
+
+6. The admin restores green_set invoking the backup ID and the -overwrite option. The -overwrite
option truncates all existing data
+in the destination and populates the tables with data from the backup dataset. Without this
flag, the backup data is appended to the
+existing data in the destination. In this case, the admin decides to overwrite the data because
it is corrupted.
+
+[source]
+----
+$ sudo -u hbase hbase restore -s green_set \
+  s3a://$ACCESS_KEY:$SECRET_KEY@prodhbasebackups/backups backup_1467823988425 \ -overwrite
+----
+
+[[br.data.security]]
+## Security of Backup Data
+
+With this feature which makes copying data to remote locations, it's important to take a
moment to clearly state the procedural
+concerns that exist around data security. Like the HBase replication feature, backup and
restore provides the constructs to automatically
+copy data from within a corporate boundary to some system outside of that boundary. It is
imperative when storing sensitive data that with backup and restore, much
+less any feature which extracts data from HBase, the locations to which data is being sent
has undergone a security audit to ensure
+that only authenticated users are allowed to access that data.
+
+For example, with the above example of backing up data to S3, it is of the utmost importance
that the proper permissions are assigned
+to the S3 bucket to ensure that only a minimum set of authorized users are allowed to access
this data. Because the data is no longer
+being accessed via HBase, and its authentication and authorization controls, we must ensure
that the filesystem storing that data is
+providing a comparable level of security. This is a manual step which users *must* implement
on their own.
+
+[[br.technical.details]]
+## Technical Details of Incremental Backup and Restore
+
+HBase incremental backups enable more efficient capture of HBase table images than previous
attempts at serial backup and restore
+solutions, such as those that only used HBase Export and Import APIs. Incremental backups
use Write Ahead Logs (WALs) to capture
+the data changes since the previous backup was created. A WAL roll (create new WALs) is executed
across all RegionServers to track
+the WALs that need to be in the backup.
+
+After the incremental backup image is created, the source backup files usually are on same
node as the data source. A process similar
+to the DistCp (distributed copy) tool is used to move the source backup files to the target
file systems. When a table restore operation
+starts, a two-step process is initiated. First, the full backup is restored from the full
backup image. Second, all WAL files from
+incremental backups between the last full backup and the incremental backup being restored
are converted to HFiles, which the HBase
+Bulk Load utility automatically imports as restored data in the table.
+
+You can only restore on a live HBase cluster because the data must be redistributed to complete
the restore operation successfully.
+
+[[br.filesystem.growth.warning]]
+## A Warning on File System Growth
+
+As a reminder, incremental backups are implemented via retaining the write-ahead logs which
HBase primarily uses for data durability.
+Thus, to ensure that all data needing to be included in a backup is still available in the
system, the HBase backup and restore feature
+retains all write-ahead logs since the last backup until the next incremental backup is executed.
+
+Like HBase Snapshots, this can have an expectedly large impact on the HDFS usage of HBase
for high volume tables. Take care in enabling
+and using the backup and restore feature, specifically with a mind to removing backup sessions
when they are not actively being used.
+
+The only automated, upper-bound on retained write-ahead logs for backup and restore is based
on the TTL of the `hbase:backup` system table which,
+as of the time this document is written, is infinite (backup table entries are never automatically
deleted). This requires that administrators
+perform backups on a schedule whose frequency is relative to the amount of available space
on HDFS (e.g. less available HDFS space requires
+more aggressive backup merges and deletions). As a reminder, the TTL can be altered on the
`hbase:backup` table using the `alter` command
+in the HBase shell. Modifying the configuration property `hbase.backup.system.ttl` in hbase-site.xml
after the system table exists has no effect.
+
+[[br.backup.capacity.planning]]
+## Capacity Planning
+
+When designing a distributed system deployment, it is critical that some basic mathmatical
rigor is executed to ensure sufficient computational
+capacity is available given the data and software requirements of the system. For this feature,
the availability of network capacity is the largest
+bottleneck when estimating the performance of some implementation of backup and restore.
The second most costly function is the speed at which
+data can be read/written.
+
+### Full Backups
+
+To estimate the duration of a full backup, we have to understand the general actions which
are invoked:
+
+* Write-ahead log roll on each RegionServer: ones to tens of seconds per RegionServer in
parallel. Relative to the load on each RegionServer.
+* Take an HBase snapshot of the table(s): tens of seconds. Relative to the number of regions
and files that comprise the table.
+* Export the snapshot to the destination: see below. Relative to the size of the data and
the network bandwidth to the destination.
+
+[[br.export.snapshot.cost]]
+To approximate how long the final step will take, we have to make some assumptions on hardware.
Be aware that these will *not* be accurate for your
+system -- these are numbers that your or your administrator know for your system. Let's say
the speed of reading data from HDFS on a single node is
+capped at 80MB/s (across all Mappers that run on that host), a modern network interface controller
(NIC) supports 10Gb/s, the top-of-rack switch can
+handle 40Gb/s, and the WAN between your clusters is 10Gb/s. This means that you can only
ship data to your remote at a speed of 1.25GB/s -- meaning
+that 16 nodes (`1.25 * 1024 / 80 = 16`) participating in the ExportSnapshot should be able
to fully saturate the link between clusters. With more
+nodes in the cluster, we can still saturate the network but at a lesser impact on any one
node which helps ensure local SLAs are made. If the size
+of the snapshot is 10TB, this would full backup would take in the ballpark of 2.5 hours (`10
* 1024 / 1.25 / (60 * 60) = 2.23hrs`)
+
+As a general statement, it is very likely that the WAN bandwidth between your local cluster
and the remote storage is the largest
+bottleneck to the speed of a full backup.
+
+When the concern is restricting the computational impact of backups to a "production system",
the above formulas can be reused with the optional
+command-line arguments to `hbase backup create`: `-b`, `-w`, `-q`. The `-b` option defines
the bandwidth at which each worker (Mapper) would
+write data. The `-w` argument limits the number of workers that would be spawned in the DistCp
job. The `-q` allows the user to specify a YARN
+queue which can limit the specific nodes where the workers will be spawned -- this can quarantine
the backup workers performing the copy to
+a set of non-critical nodes. Relating the `-b` and `-w` options to our earlier equations:
`-b` would be used to restrict each node from reading
+data at the full 80MB/s and `-w` is used to limit the job from spawning 16 worker tasks.
+
+### Incremental Backup
+
+Like we did for full backups, we have to understand the incremental backup process to approximate
its runtime and cost.
+
+* Identify new write-ahead logs since last full or incremental backup: negligible. Apriori
knowledge from the backup system table(s).
+* Read, filter, and write "minimized" HFiles equivalent to the WALs: dominated by the speed
of writing data. Relative to write speed of HDFS.
+* DistCp the HFiles to the destination: <<br.export.snapshot.cost,see above>>.
+
+For the second step, the dominating cost of this operation would be the re-writing the data
(under the assumption that a majority of the
+data in the WAL is preserved). In this case, we can assume an aggregate write speed of 30MB/s
per node. Continuing our 16-node cluster example,
+this would require approximately 15 minutes to perform this step for 50GB of data (50 * 1024
/ 60 / 60 = 14.2). The amount of time to start the
+DistCp MapReduce job would likely dominate the actual time taken to copy the data (50 / 1.25
= 40 seconds) and can be ignored.
+
+[[br.limitations]]
+## Limitations of the Backup and Restore Utility
+
+*Serial backup operations*
+
+Backup operations cannot be run concurrently. An operation includes actions like create,
delete, restore, and merge. Only one active backup session is supported. link:https://issues.apache.org/jira/browse/HBASE-16391[HBASE-16391]
+will introduce multiple-backup sessions support.
+
+*No means to cancel backups*
+
+Both backup and restore operations cannot be canceled. (link:https://issues.apache.org/jira/browse/HBASE-15997[HBASE-15997],
link:https://issues.apache.org/jira/browse/HBASE-15998[HBASE-15998]).
+The workaround to cancel a backup would be to kill the client-side backup command (`control-C`),
ensure all relevant MapReduce jobs have exited, and then
+run the `hbase backup repair` command to ensure the system backup metadata is consistent.
+
+*Backups can only be saved to a single location*
+
+Copying backup information to multiple locations is an exercise left to the user. link:https://issues.apache.org/jira/browse/HBASE-15476[HBASE-15476]
will
+introduce the ability to specify multiple-backup destinations intrinsically.
+
+*HBase superuser access is required*
+
+Only an HBase superuser (e.g. hbase) is allowed to perform backup/restore, can pose a problem
for shared HBase installations. Current mitigations would require
+coordination with system administrators to build and deploy a backup and restore strategy
(link:https://issues.apache.org/jira/browse/HBASE-14138[HBASE-14138]).
+
+*Backup restoration is an online operation*
+
+To perform a restore from a backup, it requires that the HBase cluster is online as a caveat
of the current implementation (link:https://issues.apache.org/jira/browse/HBASE-16573[HBASE-16573]).
+
+*Some operations may fail and require re-run*
+
+The HBase backup feature is primarily client driven. While there is the standard HBase retry
logic built into the HBase Connection, persistent errors in executing operations
+may propagate back to the client (e.g. snapshot failure due to region splits). The backup
implementation should be moved from client-side into the ProcedureV2 framework
+in the future which would provide additional robustness around transient/retryable failures.
The `hbase backup repair` command is meant to correct states which the system
+cannot automatically detect and recover from.
+
+*Avoidance of declaration of public API*
+
+While the Java API to interact with this feature exists and its implementation is separated
from an interface, insufficient rigor has been applied to determine if
+it is exactly what we intend to ship to users. As such, it is marked as for a `Private` audience
with the expectation that, as users begin to try the feature, there
+will be modifications that would necessitate breaking compatibility (link:https://issues.apache.org/jira/browse/HBASE-17517[HBASE-17517]).
+
+*Lack of global metrics for backup and restore*
+
+Individual backup and restore operations contain metrics about the amount of work the operation
included, but there is no centralized location (e.g. the Master UI)
+which present information for consumption (link:https://issues.apache.org/jira/browse/HBASE-16565[HBASE-16565]).

http://git-wip-us.apache.org/repos/asf/hbase/blob/8f806ab4/src/main/asciidoc/book.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/book.adoc b/src/main/asciidoc/book.adoc
index 519cf9a..1bc9ed7 100644
--- a/src/main/asciidoc/book.adoc
+++ b/src/main/asciidoc/book.adoc
@@ -19,7 +19,7 @@
  */
 ////
 
-= Apache HBase (TM) Reference Guide 
+= Apache HBase (TM) Reference Guide
 :Author: Apache HBase Team
 :Email: <hbase-dev@lists.apache.org>
 :doctype: book
@@ -62,6 +62,7 @@ include::_chapters/mapreduce.adoc[]
 include::_chapters/security.adoc[]
 include::_chapters/architecture.adoc[]
 include::_chapters/hbase_mob.adoc[]
+include::_chapters/backup_restore.adoc[]
 include::_chapters/hbase_apis.adoc[]
 include::_chapters/external_apis.adoc[]
 include::_chapters/thrift_filter_language.adoc[]
@@ -93,5 +94,3 @@ include::_chapters/asf.adoc[]
 include::_chapters/orca.adoc[]
 include::_chapters/tracing.adoc[]
 include::_chapters/rpc.adoc[]
-
-

http://git-wip-us.apache.org/repos/asf/hbase/blob/8f806ab4/src/main/site/resources/images/backup-app-components.png
----------------------------------------------------------------------
diff --git a/src/main/site/resources/images/backup-app-components.png b/src/main/site/resources/images/backup-app-components.png
new file mode 100644
index 0000000..5e403e2
Binary files /dev/null and b/src/main/site/resources/images/backup-app-components.png differ

http://git-wip-us.apache.org/repos/asf/hbase/blob/8f806ab4/src/main/site/resources/images/backup-cloud-appliance.png
----------------------------------------------------------------------
diff --git a/src/main/site/resources/images/backup-cloud-appliance.png b/src/main/site/resources/images/backup-cloud-appliance.png
new file mode 100644
index 0000000..76b6d5a
Binary files /dev/null and b/src/main/site/resources/images/backup-cloud-appliance.png differ

http://git-wip-us.apache.org/repos/asf/hbase/blob/8f806ab4/src/main/site/resources/images/backup-dedicated-cluster.png
----------------------------------------------------------------------
diff --git a/src/main/site/resources/images/backup-dedicated-cluster.png b/src/main/site/resources/images/backup-dedicated-cluster.png
new file mode 100644
index 0000000..bca282d
Binary files /dev/null and b/src/main/site/resources/images/backup-dedicated-cluster.png differ

http://git-wip-us.apache.org/repos/asf/hbase/blob/8f806ab4/src/main/site/resources/images/backup-intra-cluster.png
----------------------------------------------------------------------
diff --git a/src/main/site/resources/images/backup-intra-cluster.png b/src/main/site/resources/images/backup-intra-cluster.png
new file mode 100644
index 0000000..113c577
Binary files /dev/null and b/src/main/site/resources/images/backup-intra-cluster.png differ


Mime
View raw message