falcon-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From b...@apache.org
Subject falcon git commit: FALCON-2027 Enhance documentation on data replication from HDP to Azure
Date Fri, 17 Jun 2016 18:05:54 GMT
Repository: falcon
Updated Branches:
  refs/heads/0.10 196a76bfd -> bdda78ca8

FALCON-2027 Enhance documentation on data replication from HDP to Azure

Also fixed a typo in FalconDocumentation.twiki.

Author: yzheng-hortonworks <yzheng@hortonworks.com>

Reviewers: "Balu Vellanki <balu@apache.org?"

Closes #187 from yzheng-hortonworks/FALCON-2027

(cherry picked from commit 037e6821b6eb7dbad1c0b1a3508aa6715e77e454)
Signed-off-by: bvellanki <bvellanki@hortonworks.com>

Project: http://git-wip-us.apache.org/repos/asf/falcon/repo
Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/bdda78ca
Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/bdda78ca
Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/bdda78ca

Branch: refs/heads/0.10
Commit: bdda78ca83e7778acda2f80f65d0e635abe3d043
Parents: 196a76b
Author: yzheng-hortonworks <yzheng@hortonworks.com>
Authored: Fri Jun 17 11:05:37 2016 -0700
Committer: bvellanki <bvellanki@hortonworks.com>
Committed: Fri Jun 17 11:05:51 2016 -0700

 docs/src/site/twiki/DataReplicationAzure.twiki | 61 +++++++++++++++++++++
 docs/src/site/twiki/FalconDocumentation.twiki  |  4 +-
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/docs/src/site/twiki/DataReplicationAzure.twiki b/docs/src/site/twiki/DataReplicationAzure.twiki
new file mode 100644
index 0000000..24e543b
--- /dev/null
+++ b/docs/src/site/twiki/DataReplicationAzure.twiki
@@ -0,0 +1,61 @@
+---+ Data Replication between On-premise Hadoop Clusters and Azure Cloud
+---++ Overview
+Falcon provides an easy way to replicate data between on-premise Hadoop clusters and Azure
+With this feature, users would be able to build a hybrid data pipeline,
+e.g. processing sensitive data on-premises for privacy and compliance reasons
+while leverage cloud for elastic scale and online services (e.g. Azure machine learning)
with non-sensitive data.
+---++ Use Case
+1. Copy data from on-premise Hadoop clusters to Azure cloud
+2. Copy data from Azure cloud to on-premise Hadoop clusters
+3. Copy data within Azure cloud (i.e. from one Azure location to another).
+---++ Usage
+---+++ Set Up Azure Blob Credentials
+To move data to/from Azure blobs, we need to add Azure blob credentials in HDFS.
+This can be done by adding the credential property through Ambari HDFS configs, and HDFS
needs to be restarted after adding the credential.
+You can also add the credential property to core-site.xml directly, but make sure you restart
HDFS from command line instead of Ambari.
+Otherwise, Ambari will take the previous HDFS configuration without your Azure blob credentials.
+      <name>fs.azure.account.key.{AZURE_BLOB_ACCOUNT_NAME}.blob.core.windows.net</name>
+      <value>{AZURE_BLOB_ACCOUNT_KEY}</value>
+To verify you set up Azure credential properly, you can check if you are able to access Azure
blob through HDFS, e.g.
+hadoop fs ­ls wasb://{AZURE_BLOB_CONTAINER}@{AZURE_BLOB_ACCOUNT_NAME}.blob.core.windows.net/
+---+++ Replication Feed
+[[EntitySpecification][Falcon replication feed]] can be used for data replication to/from
Azure cloud.
+You can specify WASB (i.e. Windows Azure Storage Blob) url in source or target locations.
+See below for an example of data replication from Hadoop cluster to Azure blob.
+Note that the clusters for the source and the target need to be different.
+Analogously, if you want to copy data from Azure blob, you can add Azure blob location to
the source.
+<?xml version="1.0" encoding="UTF-8"?>
+<feed name="AzureReplication" xmlns="uri:falcon:feed:0.1">
+    <frequency>months(1)</frequency>
+    <clusters>
+        <cluster name="SampleCluster1" type="source">
+            <validity start="2010-06-01T00:00Z" end="2010-06-02T00:00Z"/>
+            <retention limit="days(90)" action="delete"/>
+        </cluster>
+        <cluster name="SampleCluster2" type="target">
+            <validity start="2010-06-01T00:00Z" end="2010-06-02T00:00Z"/>
+            <retention limit="days(90)" action="delete"/>
+            <locations>
+                <location type="data" path="wasb://replication-test@mystorage.blob.core.windows.net/replicated-${YEAR}-${MONTH}"/>
+            </locations>
+        </cluster>
+    </clusters>
+    <locations>
+        <location type="data" path="/apps/falcon/demo/data-${YEAR}-${MONTH}" />
+    </locations>
+    <ACL owner="ambari-qa" group="users" permission="0755"/>
+    <schema location="hcat" provider="hcat"/>

diff --git a/docs/src/site/twiki/FalconDocumentation.twiki b/docs/src/site/twiki/FalconDocumentation.twiki
index 4848746..fe1c0de 100644
--- a/docs/src/site/twiki/FalconDocumentation.twiki
+++ b/docs/src/site/twiki/FalconDocumentation.twiki
@@ -447,9 +447,11 @@ cluster, (no dirty reads)
 ---+++ Archival as Replication
-Falcon allows users to archive data from on-premice to cloud, either Azure WASB or S3.
+Falcon allows users to archive data from on-premise to cloud, either Azure WASB or S3.
 It uses the underlying replication for archiving data from source to target. The archival
URI is
 specified as the overridden location for the target cluster.
+Note that for data replication between on-premise and Azure cloud, Azure credentials need
to be added to core-site.xml.
+Please refer to [[DataReplicationAzure][AzureDataReplication]] for details and examples.

View raw message