falcon-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From b...@apache.org
Subject falcon git commit: FALCON-2027 Enhance documentation on data replication from HDP to Azure
Date Fri, 17 Jun 2016 18:05:45 GMT
Repository: falcon
Updated Branches:
  refs/heads/master d66185988 -> 037e6821b

FALCON-2027 Enhance documentation on data replication from HDP to Azure

Also fixed a typo in FalconDocumentation.twiki.

Author: yzheng-hortonworks <yzheng@hortonworks.com>

Reviewers: "Balu Vellanki <balu@apache.org?"

Closes #187 from yzheng-hortonworks/FALCON-2027

Project: http://git-wip-us.apache.org/repos/asf/falcon/repo
Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/037e6821
Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/037e6821
Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/037e6821

Branch: refs/heads/master
Commit: 037e6821b6eb7dbad1c0b1a3508aa6715e77e454
Parents: d661859
Author: yzheng-hortonworks <yzheng@hortonworks.com>
Authored: Fri Jun 17 11:05:37 2016 -0700
Committer: bvellanki <bvellanki@hortonworks.com>
Committed: Fri Jun 17 11:05:37 2016 -0700

 docs/src/site/twiki/DataReplicationAzure.twiki | 61 +++++++++++++++++++++
 docs/src/site/twiki/FalconDocumentation.twiki  |  4 +-
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/docs/src/site/twiki/DataReplicationAzure.twiki b/docs/src/site/twiki/DataReplicationAzure.twiki
new file mode 100644
index 0000000..24e543b
--- /dev/null
+++ b/docs/src/site/twiki/DataReplicationAzure.twiki
@@ -0,0 +1,61 @@
+---+ Data Replication between On-premise Hadoop Clusters and Azure Cloud
+---++ Overview
+Falcon provides an easy way to replicate data between on-premise Hadoop clusters and Azure
+With this feature, users would be able to build a hybrid data pipeline,
+e.g. processing sensitive data on-premises for privacy and compliance reasons
+while leverage cloud for elastic scale and online services (e.g. Azure machine learning)
with non-sensitive data.
+---++ Use Case
+1. Copy data from on-premise Hadoop clusters to Azure cloud
+2. Copy data from Azure cloud to on-premise Hadoop clusters
+3. Copy data within Azure cloud (i.e. from one Azure location to another).
+---++ Usage
+---+++ Set Up Azure Blob Credentials
+To move data to/from Azure blobs, we need to add Azure blob credentials in HDFS.
+This can be done by adding the credential property through Ambari HDFS configs, and HDFS
needs to be restarted after adding the credential.
+You can also add the credential property to core-site.xml directly, but make sure you restart
HDFS from command line instead of Ambari.
+Otherwise, Ambari will take the previous HDFS configuration without your Azure blob credentials.
+      <name>fs.azure.account.key.{AZURE_BLOB_ACCOUNT_NAME}.blob.core.windows.net</name>
+      <value>{AZURE_BLOB_ACCOUNT_KEY}</value>
+To verify you set up Azure credential properly, you can check if you are able to access Azure
blob through HDFS, e.g.
+hadoop fs ­ls wasb://{AZURE_BLOB_CONTAINER}@{AZURE_BLOB_ACCOUNT_NAME}.blob.core.windows.net/
+---+++ Replication Feed
+[[EntitySpecification][Falcon replication feed]] can be used for data replication to/from
Azure cloud.
+You can specify WASB (i.e. Windows Azure Storage Blob) url in source or target locations.
+See below for an example of data replication from Hadoop cluster to Azure blob.
+Note that the clusters for the source and the target need to be different.
+Analogously, if you want to copy data from Azure blob, you can add Azure blob location to
the source.
+<?xml version="1.0" encoding="UTF-8"?>
+<feed name="AzureReplication" xmlns="uri:falcon:feed:0.1">
+    <frequency>months(1)</frequency>
+    <clusters>
+        <cluster name="SampleCluster1" type="source">
+            <validity start="2010-06-01T00:00Z" end="2010-06-02T00:00Z"/>
+            <retention limit="days(90)" action="delete"/>
+        </cluster>
+        <cluster name="SampleCluster2" type="target">
+            <validity start="2010-06-01T00:00Z" end="2010-06-02T00:00Z"/>
+            <retention limit="days(90)" action="delete"/>
+            <locations>
+                <location type="data" path="wasb://replication-test@mystorage.blob.core.windows.net/replicated-${YEAR}-${MONTH}"/>
+            </locations>
+        </cluster>
+    </clusters>
+    <locations>
+        <location type="data" path="/apps/falcon/demo/data-${YEAR}-${MONTH}" />
+    </locations>
+    <ACL owner="ambari-qa" group="users" permission="0755"/>
+    <schema location="hcat" provider="hcat"/>

diff --git a/docs/src/site/twiki/FalconDocumentation.twiki b/docs/src/site/twiki/FalconDocumentation.twiki
index 4848746..fe1c0de 100644
--- a/docs/src/site/twiki/FalconDocumentation.twiki
+++ b/docs/src/site/twiki/FalconDocumentation.twiki
@@ -447,9 +447,11 @@ cluster, (no dirty reads)
 ---+++ Archival as Replication
-Falcon allows users to archive data from on-premice to cloud, either Azure WASB or S3.
+Falcon allows users to archive data from on-premise to cloud, either Azure WASB or S3.
 It uses the underlying replication for archiving data from source to target. The archival
URI is
 specified as the overridden location for the target cluster.
+Note that for data replication between on-premise and Azure cloud, Azure credentials need
to be added to core-site.xml.
+Please refer to [[DataReplicationAzure][AzureDataReplication]] for details and examples.

View raw message