Return-Path: X-Original-To: apmail-falcon-commits-archive@minotaur.apache.org Delivered-To: apmail-falcon-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1B10319573 for ; Tue, 12 Apr 2016 23:05:56 +0000 (UTC) Received: (qmail 35330 invoked by uid 500); 12 Apr 2016 23:05:56 -0000 Delivered-To: apmail-falcon-commits-archive@falcon.apache.org Received: (qmail 35293 invoked by uid 500); 12 Apr 2016 23:05:55 -0000 Mailing-List: contact commits-help@falcon.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@falcon.apache.org Delivered-To: mailing list commits@falcon.apache.org Received: (qmail 35201 invoked by uid 99); 12 Apr 2016 23:05:55 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2016 23:05:55 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 1D88BE012C; Tue, 12 Apr 2016 23:05:55 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: balu@apache.org To: commits@falcon.apache.org Date: Tue, 12 Apr 2016 23:05:57 -0000 Message-Id: <397027a2582f4074a8fd325cd7cbc951@git.apache.org> In-Reply-To: <15e695d659374c6cbe921db4ccdb9042@git.apache.org> References: <15e695d659374c6cbe921db4ccdb9042@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [3/3] falcon git commit: FALCON-1107 Move trusted extensions processing to server side FALCON-1107 Move trusted extensions processing to server side Ignore any documentation issues as it will be addressed in https://issues.apache.org/jira/browse/FALCON-1106. Thanks! Author: Sowmya Ramesh Reviewers: "Balu Vellanki , Venkat Ranganathan " Closes #92 from sowmyaramesh/FALCON-1107 Project: http://git-wip-us.apache.org/repos/asf/falcon/repo Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/95bf312f Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/95bf312f Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/95bf312f Branch: refs/heads/master Commit: 95bf312f46bc96bc247645da6500b495c21aede3 Parents: c52961c Author: Sowmya Ramesh Authored: Tue Apr 12 16:05:48 2016 -0700 Committer: bvellanki Committed: Tue Apr 12 16:05:48 2016 -0700 ---------------------------------------------------------------------- addons/extensions/hdfs-mirroring/README | 29 ++ addons/extensions/hdfs-mirroring/pom.xml | 32 ++ .../main/META/hdfs-mirroring-properties.json | 137 +++++++ .../runtime/hdfs-mirroring-template.xml | 45 +++ .../runtime/hdfs-mirroring-workflow.xml | 82 +++++ addons/extensions/hive-mirroring/README | 58 +++ addons/extensions/hive-mirroring/pom.xml | 32 ++ .../main/META/hive-mirroring-properties.json | 179 +++++++++ .../META/hive-mirroring-secure-properties.json | 191 ++++++++++ .../runtime/hive-mirroring-secure-template.xml | 45 +++ .../runtime/hive-mirroring-secure-workflow.xml | 363 +++++++++++++++++++ .../runtime/hive-mirroring-template.xml | 45 +++ .../runtime/hive-mirroring-workflow.xml | 255 +++++++++++++ .../java/org/apache/falcon/hive/HiveDRArgs.java | 9 +- .../org/apache/falcon/hive/HiveDROptions.java | 38 +- addons/recipes/hdfs-replication/README.txt | 29 -- addons/recipes/hdfs-replication/pom.xml | 32 -- .../resources/hdfs-replication-template.xml | 44 --- .../resources/hdfs-replication-workflow.xml | 82 ----- .../main/resources/hdfs-replication.properties | 79 ---- .../recipes/hive-disaster-recovery/README.txt | 58 --- addons/recipes/hive-disaster-recovery/pom.xml | 32 -- .../hive-disaster-recovery-secure-template.xml | 45 --- .../hive-disaster-recovery-secure-workflow.xml | 363 ------------------- .../hive-disaster-recovery-secure.properties | 110 ------ .../hive-disaster-recovery-template.xml | 45 --- .../hive-disaster-recovery-workflow.xml | 249 ------------- .../resources/hive-disaster-recovery.properties | 98 ----- .../falcon/catalog/AbstractCatalogService.java | 12 + .../falcon/catalog/HiveCatalogService.java | 16 + common/src/main/resources/startup.properties | 2 + extensions/pom.xml | 112 ++++++ .../falcon/extensions/AbstractExtension.java | 58 +++ .../org/apache/falcon/extensions/Extension.java | 102 ++++++ .../falcon/extensions/ExtensionBuilder.java | 32 ++ .../falcon/extensions/ExtensionFactory.java | 48 +++ .../falcon/extensions/ExtensionProperties.java | 89 +++++ .../falcon/extensions/ExtensionService.java | 49 +++ .../mirroring/hdfs/HdfsMirroringExtension.java | 111 ++++++ .../hdfs/HdfsMirroringExtensionProperties.java | 65 ++++ .../mirroring/hive/HiveMirroringExtension.java | 231 ++++++++++++ .../hive/HiveMirroringExtensionProperties.java | 92 +++++ .../falcon/extensions/store/ExtensionStore.java | 215 +++++++++++ .../util/ExtensionProcessBuilderUtils.java | 309 ++++++++++++++++ .../falcon/extensions/ExtensionServiceTest.java | 53 +++ .../apache/falcon/extensions/ExtensionTest.java | 160 ++++++++ .../store/AbstractTestExtensionStore.java | 103 ++++++ .../extensions/store/ExtensionStoreTest.java | 65 ++++ .../src/test/resources/backup-cluster-0.1.xml | 44 +++ .../test/resources/hdfs-mirroring-template.xml | 45 +++ .../test/resources/hive-mirroring-template.xml | 45 +++ .../src/test/resources/primary-cluster-0.1.xml | 44 +++ oozie/pom.xml | 6 + .../service/SharedLibraryHostingService.java | 91 ++++- pom.xml | 2 + src/main/assemblies/distributed-package.xml | 79 +++- src/main/assemblies/standalone-package.xml | 80 +++- 57 files changed, 3851 insertions(+), 1315 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/extensions/hdfs-mirroring/README ---------------------------------------------------------------------- diff --git a/addons/extensions/hdfs-mirroring/README b/addons/extensions/hdfs-mirroring/README new file mode 100644 index 0000000..78f1726 --- /dev/null +++ b/addons/extensions/hdfs-mirroring/README @@ -0,0 +1,29 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +HDFS Directory Replication Extension + +Overview +This extension implements replicating arbitrary directories on HDFS from one +Hadoop cluster to another Hadoop cluster. +This piggy backs on replication solution in Falcon which uses the DistCp tool. + +Use Case +* Copy directories between HDFS clusters with out dated partitions +* Archive directories from HDFS to Cloud. Ex: S3, Azure WASB + +Limitations +As the data volume and number of files grow, this can get inefficient. http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/extensions/hdfs-mirroring/pom.xml ---------------------------------------------------------------------- diff --git a/addons/extensions/hdfs-mirroring/pom.xml b/addons/extensions/hdfs-mirroring/pom.xml new file mode 100644 index 0000000..cb9304e --- /dev/null +++ b/addons/extensions/hdfs-mirroring/pom.xml @@ -0,0 +1,32 @@ + + + + + + + 4.0.0 + org.apache.falcon.extensions + falcon-hdfs-mirroring-extension + 0.10-SNAPSHOT + Apache Falcon sample Hdfs mirroring extension + Apache Falcon sample Hdfs mirroring extension + jar + http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/extensions/hdfs-mirroring/src/main/META/hdfs-mirroring-properties.json ---------------------------------------------------------------------- diff --git a/addons/extensions/hdfs-mirroring/src/main/META/hdfs-mirroring-properties.json b/addons/extensions/hdfs-mirroring/src/main/META/hdfs-mirroring-properties.json new file mode 100644 index 0000000..f1b4775 --- /dev/null +++ b/addons/extensions/hdfs-mirroring/src/main/META/hdfs-mirroring-properties.json @@ -0,0 +1,137 @@ +{ + "shortDescription": "This extension implements replicating arbitrary directories on HDFS from one Hadoop cluster to another Hadoop cluster. This piggy backs on replication solution in Falcon which uses the DistCp tool.", + "properties":[ + { + "propertyName":"jobName", + "required":true, + "description":"Unique job name", + "example":"hdfs-monthly-sales-dr" + }, + { + "propertyName":"jobClusterName", + "required":true, + "description":"Cluster where job should run", + "example":"backupCluster" + }, + { + "propertyName":"jobValidityStart", + "required":true, + "description":"Job validity start time", + "example":"2016-03-03T00:00Z" + }, + { + "propertyName":"jobValidityEnd", + "required":true, + "description":"Job validity end time", + "example":"2018-03-13T00:00Z" + }, + { + "propertyName":"jobFrequency", + "required":true, + "description":"job frequency. Valid frequency types are minutes, hours, days, months", + "example":"months(1)" + }, + { + "propertyName":"jobTimezone", + "required":false, + "description":"Time zone for the job", + "example":"GMT" + }, + { + "propertyName":"jobTags", + "required":false, + "description":"list of comma separated tags. Key Value Pairs, separated by comma", + "example":"consumer=consumer@xyz.com, owner=producer@xyz.com, _department_type=forecasting" + }, + { + "propertyName":"jobRetryPolicy", + "required":false, + "description":"Job retry policy", + "example":"periodic" + }, + { + "propertyName":"jobRetryDelay", + "required":false, + "description":"Job retry delay", + "example":"minutes(30)" + }, + { + "propertyName":"jobRetryAttempts", + "required":false, + "description":"Job retry attempts", + "example":"3" + }, + { + "propertyName":"jobRetryOnTimeout", + "required":false, + "description":"Job retry on timeout", + "example":"true" + }, + { + "propertyName":"jobAclOwner", + "required":false, + "description":"ACL owner", + "example":"ambari-qa" + }, + { + "propertyName":"jobAclGroup", + "required":false, + "description":"ACL group", + "example":"users" + }, + { + "propertyName":"jobAclPermission", + "required":false, + "description":"ACL permission", + "example":"0x755" + }, + { + "propertyName":"sourceDir", + "required":true, + "description":"Multiple hdfs comma separated source directories", + "example":"/user/ambari-qa/primaryCluster/dr/input1, /user/ambari-qa/primaryCluster/dr/input2" + }, + { + "propertyName":"sourceCluster", + "required":true, + "description":"Source cluster for hdfs mirroring", + "example":"primaryCluster" + }, + { + "propertyName":"targetDir", + "required":true, + "description":"Target hdfs directory", + "example":"/user/ambari-qa/backupCluster/dr" + }, + { + "propertyName":"targetCluster", + "required":true, + "description":"Target cluster for hdfs mirroring", + "example":"backupCluster" + }, + { + "propertyName":"distcpMaxMaps", + "required":false, + "description":"Maximum number of mappers for DistCP", + "example":"1" + }, + { + "propertyName":"distcpMapBandwidth", + "required":false, + "description":"Bandwidth in MB for each mapper in DistCP", + "example":"100" + }, + { + "propertyName":"jobNotificationType", + "required":false, + "description":"Email Notification for Falcon instance completion", + "example":"email" + }, + { + "propertyName":"jobNotificationReceivers", + "required":false, + "description":"Comma separated email Id's", + "example":"user1@gmail.com, user2@gmail.com" + } + ] +} http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/extensions/hdfs-mirroring/src/main/resources/runtime/hdfs-mirroring-template.xml ---------------------------------------------------------------------- diff --git a/addons/extensions/hdfs-mirroring/src/main/resources/runtime/hdfs-mirroring-template.xml b/addons/extensions/hdfs-mirroring/src/main/resources/runtime/hdfs-mirroring-template.xml new file mode 100644 index 0000000..d511d00 --- /dev/null +++ b/addons/extensions/hdfs-mirroring/src/main/resources/runtime/hdfs-mirroring-template.xml @@ -0,0 +1,45 @@ + + + + + + + + + + + + + + 1 + + LAST_ONLY + ##jobFrequency## + ##jobTimezone## + + + + + + + + + + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/extensions/hdfs-mirroring/src/main/resources/runtime/hdfs-mirroring-workflow.xml ---------------------------------------------------------------------- diff --git a/addons/extensions/hdfs-mirroring/src/main/resources/runtime/hdfs-mirroring-workflow.xml b/addons/extensions/hdfs-mirroring/src/main/resources/runtime/hdfs-mirroring-workflow.xml new file mode 100644 index 0000000..1e2282c --- /dev/null +++ b/addons/extensions/hdfs-mirroring/src/main/resources/runtime/hdfs-mirroring-workflow.xml @@ -0,0 +1,82 @@ + + + + + + + ${jobTracker} + ${nameNode} + + + oozie.launcher.mapreduce.job.user.classpath.first + true + + + mapred.job.queue.name + ${queueName} + + + oozie.launcher.mapred.job.priority + ${jobPriority} + + + oozie.use.system.libpath + true + + + oozie.action.sharelib.for.java + distcp + + + oozie.launcher.oozie.libpath + ${wf:conf("falcon.libpath")} + + + oozie.launcher.mapreduce.job.hdfs-servers + ${sourceClusterFS},${targetClusterFS} + + + org.apache.falcon.replication.FeedReplicator + -Dmapred.job.queue.name=${queueName} + -Dmapred.job.priority=${jobPriority} + -maxMaps + ${distcpMaxMaps} + -mapBandwidth + ${distcpMapBandwidth} + -sourcePaths + ${sourceDir} + -targetPath + ${targetClusterFS}${targetDir} + -falconFeedStorageType + FILESYSTEM + -availabilityFlag + ${availabilityFlag == 'NA' ? "NA" : availabilityFlag} + -counterLogDir + ${logDir}/job-${nominalTime}/${srcClusterName == 'NA' ? '' : srcClusterName} + + + + + + + Workflow action failed, error message[${wf:errorMessage(wf:lastErrorNode())}] + + + + http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/extensions/hive-mirroring/README ---------------------------------------------------------------------- diff --git a/addons/extensions/hive-mirroring/README b/addons/extensions/hive-mirroring/README new file mode 100644 index 0000000..827f7e5 --- /dev/null +++ b/addons/extensions/hive-mirroring/README @@ -0,0 +1,58 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +Hive Metastore Disaster Recovery Recipe + +Overview +This extension implements replicating hive metadata and data from one +Hadoop cluster to another Hadoop cluster. +This piggy backs on replication solution in Falcon which uses the DistCp tool. + +Use Case +* +* + +Limitations +* +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +Hive Metastore Disaster Recovery Extension + +Overview +This extension implements replicating hive metadata and data from one +Hadoop cluster to another Hadoop cluster. +This piggy backs on replication solution in Falcon which uses the DistCp tool. + +Use Case +* +* + +Limitations +* http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/extensions/hive-mirroring/pom.xml ---------------------------------------------------------------------- diff --git a/addons/extensions/hive-mirroring/pom.xml b/addons/extensions/hive-mirroring/pom.xml new file mode 100644 index 0000000..adfb0be --- /dev/null +++ b/addons/extensions/hive-mirroring/pom.xml @@ -0,0 +1,32 @@ + + + + + + + 4.0.0 + org.apache.falcon.extensions + falcon-hive-mirroring-extension + 0.10-SNAPSHOT + Apache Falcon sample Hive mirroring extension + Apache Falcon sample Hive mirroring extension + jar + http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/extensions/hive-mirroring/src/main/META/hive-mirroring-properties.json ---------------------------------------------------------------------- diff --git a/addons/extensions/hive-mirroring/src/main/META/hive-mirroring-properties.json b/addons/extensions/hive-mirroring/src/main/META/hive-mirroring-properties.json new file mode 100644 index 0000000..a9f3d1b --- /dev/null +++ b/addons/extensions/hive-mirroring/src/main/META/hive-mirroring-properties.json @@ -0,0 +1,179 @@ +{ + "shortDescription":"This extension implements replicating hive metadata and data from one Hadoop cluster to another Hadoop cluster.", + "properties":[ + { + "propertyName":"jobName", + "required":true, + "description":"Unique job name", + "example":"hive-monthly-sales-dr" + }, + { + "propertyName":"jobClusterName", + "required":true, + "description":"Cluster where job should run", + "example":"backupCluster" + }, + { + "propertyName":"jobValidityStart", + "required":true, + "description":"Job validity start time", + "example":"2016-03-03T00:00Z" + }, + { + "propertyName":"jobValidityEnd", + "required":true, + "description":"Job validity end time", + "example":"2018-03-13T00:00Z" + }, + { + "propertyName":"jobFrequency", + "required":true, + "description":"job frequency. Valid frequency types are minutes, hours, days, months", + "example":"months(1)" + }, + { + "propertyName":"jobTimezone", + "required":false, + "description":"Time zone for the job", + "example":"GMT" + }, + { + "propertyName":"jobTags", + "required":false, + "description":"list of comma separated tags. Key Value Pairs, separated by comma", + "example":"consumer=consumer@xyz.com, owner=producer@xyz.com, _department_type=forecasting" + }, + { + "propertyName":"jobRetryPolicy", + "required":false, + "description":"Job retry policy", + "example":"periodic" + }, + { + "propertyName":"jobRetryDelay", + "required":false, + "description":"Job retry delay", + "example":"minutes(30)" + }, + { + "propertyName":"jobRetryAttempts", + "required":false, + "description":"Job retry attempts", + "example":"3" + }, + { + "propertyName":"jobRetryOnTimeout", + "required":false, + "description":"Job retry on timeout", + "example":true + }, + { + "propertyName":"jobAclOwner", + "required":false, + "description":"ACL owner", + "example":"ambari-qa" + }, + { + "propertyName":"jobAclGroup", + "required":false, + "description":"ACL group", + "example":"users" + }, + { + "propertyName":"jobAclPermission", + "required":false, + "description":"ACL permission", + "example":"0x755" + }, + { + "propertyName":"sourceCluster", + "required":true, + "description":"Source cluster for hive mirroring", + "example":"primaryCluster" + }, + { + "propertyName":"sourceHiveServer2Uri", + "required":true, + "description":"Hive2 server end point", + "example":"hive2://localhost:10000" + }, + { + "propertyName":"sourceDatabases", + "required":true, + "description":"For DB level replication specify multiple comma separated databases to replicate", + "example":"salesDb" + }, + { + "propertyName":"sourceTables", + "required":false, + "description":"For table level replication specify multiple comma separated tables to replicate", + "example":"monthly_sales1, monthly_sales2" + }, + { + "propertyName":"sourceStagingPath", + "required":false, + "description":"Staging path on source", + "example":"/apps/hive/dr" + }, + { + "propertyName":"targetCluster", + "required":true, + "description":"target cluster for hive mirroring", + "example":"backupCluster" + }, + { + "propertyName":"targetHiveServer2Uri", + "required":true, + "description":"Hive2 server end point", + "example":"hive2://localhost:10000" + }, + { + "propertyName":"targetStagingPath", + "required":false, + "description":"Staging path on target", + "example":"/apps/hive/dr" + }, + { + "propertyName":"maxEvents", + "required":false, + "description":"To ceil the max events processed each time the job runs. Set it to max value depending on your bandwidth limit. Setting it to -1 will process all the events but can hog up the bandwidth. Use it judiciously!", + "example":"10000" + }, + { + "propertyName":"replicationMaxMaps", + "required":false, + "description":"Maximum number of mappers to use for hive replication", + "example":"1" + }, + { + "propertyName":"distcpMaxMaps", + "required":false, + "description":"Maximum number of mappers for DistCP", + "example":"1" + }, + { + "propertyName":"distcpMapBandwidth", + "required":false, + "description":"Bandwidth in MB for each mapper in DistCP", + "example":"100" + }, + { + "propertyName":"tdeEncryptionEnabled", + "required":false, + "description":"Set this flag to true if TDE encryption is enabled on source and target. Default value is false", + "example":"true" + }, + { + "propertyName":"jobNotificationType", + "required":false, + "description":"Email Notification for Falcon instance completion", + "example":"email" + }, + { + "propertyName":"jobNotificationReceivers", + "required":false, + "description":"Comma separated email Id's", + "example":"user1@gmail.com, user2@gmail.com" + } + ] +} \ No newline at end of file http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/extensions/hive-mirroring/src/main/META/hive-mirroring-secure-properties.json ---------------------------------------------------------------------- diff --git a/addons/extensions/hive-mirroring/src/main/META/hive-mirroring-secure-properties.json b/addons/extensions/hive-mirroring/src/main/META/hive-mirroring-secure-properties.json new file mode 100644 index 0000000..8ec03b5 --- /dev/null +++ b/addons/extensions/hive-mirroring/src/main/META/hive-mirroring-secure-properties.json @@ -0,0 +1,191 @@ +{ + "shortDescription": "This extension implements replicating hive metadata and data from one Hadoop cluster to another Hadoop cluster in secure environment.", + "properties":[ + { + "propertyName":"jobName", + "required":true, + "description":"Unique job name", + "example":"hive-monthly-sales-dr" + }, + { + "propertyName":"jobClusterName", + "required":true, + "description":"Cluster where job should run", + "example":"backupCluster" + }, + { + "propertyName":"jobValidityStart", + "required":true, + "description":"Job validity start time", + "example":"2016-03-03T00:00Z" + }, + { + "propertyName":"jobValidityEnd", + "required":true, + "description":"Job validity end time", + "example":"2018-03-13T00:00Z" + }, + { + "propertyName":"jobFrequency", + "required":true, + "description":"job frequency. Valid frequency types are minutes, hours, days, months", + "example":"months(1)" + }, + { + "propertyName":"jobTimezone", + "required":false, + "description":"Time zone for the job", + "example":"GMT" + }, + { + "propertyName":"jobTags", + "required":false, + "description":"list of comma separated tags. Key Value Pairs, separated by comma", + "example":"consumer=consumer@xyz.com, owner=producer@xyz.com, _department_type=forecasting" + }, + { + "propertyName":"jobRetryPolicy", + "required":false, + "description":"Job retry policy", + "example":"periodic" + }, + { + "propertyName":"jobRetryDelay", + "required":false, + "description":"Job retry delay", + "example":"minutes(30)" + }, + { + "propertyName":"jobRetryAttempts", + "required":false, + "description":"Job retry attempts", + "example":"3" + }, + { + "propertyName":"jobRetryOnTimeout", + "required":false, + "description":"Job retry on timeout", + "example":true + }, + { + "propertyName":"jobAclOwner", + "required":false, + "description":"ACL owner", + "example":"ambari-qa" + }, + { + "propertyName":"jobAclGroup", + "required":false, + "description":"ACL group", + "example":"users" + }, + { + "propertyName":"jobAclPermission", + "required":false, + "description":"ACL permission", + "example":"0x755" + }, + { + "propertyName":"sourceCluster", + "required":true, + "description":"Source cluster for hive mirroring", + "example":"primaryCluster" + }, + { + "propertyName":"sourceHiveServer2Uri", + "required":true, + "description":"Hive2 server end point", + "example":"hive2://localhost:10000" + }, + { + "propertyName":"sourceDatabases", + "required":true, + "description":"For DB level replication specify multiple comma separated databases to replicate", + "example":"salesDb" + }, + { + "propertyName":"sourceTables", + "required":false, + "description":"For table level replication specify multiple comma separated tables to replicate", + "example":"monthly_sales1, monthly_sales2" + }, + { + "propertyName":"sourceStagingPath", + "required":false, + "description":"Staging path on source", + "example":"/apps/hive/dr" + }, + { + "propertyName":"sourceHive2KerberosPrincipal", + "required":true, + "description":"Required on secure clusters. Kerberos principal required to access hive servers ", + "example":"hive/_HOST@EXAMPLE.COM" + }, + { + "propertyName":"targetCluster", + "required":true, + "description":"target cluster for hive mirroring", + "example":"backupCluster" + }, + { + "propertyName":"targetHiveServer2Uri", + "required":true, + "description":"Hive2 server end point", + "example":"hive2://localhost:10000" + }, + { + "propertyName":"targetStagingPath", + "required":false, + "description":"Staging path on target", + "example":"/apps/hive/dr" + }, + { + "propertyName":"targetHive2KerberosPrincipal", + "required":true, + "description":"Required on secure clusters. Kerberos principal required to access hive servers ", + "example":"hive/_HOST@EXAMPLE.COM" + }, + { + "propertyName":"maxEvents", + "required":false, + "description":"To ceil the max events processed each time the job runs. Set it to max value depending on your bandwidth limit. Setting it to -1 will process all the events but can hog up the bandwidth. Use it judiciously!", + "example":"10000" + }, + { + "propertyName":"replicationMaxMaps", + "required":false, + "description":"Maximum number of mappers to use for hive replication", + "example":"1" + }, + { + "propertyName":"distcpMaxMaps", + "required":false, + "description":"Maximum number of mappers for DistCP", + "example":"1" + }, + { + "propertyName":"distcpMapBandwidth", + "required":false, + "description":"Bandwidth in MB for each mapper in DistCP", + "example":"100" + }, + { + "propertyName":"tdeEncryptionEnabled", + "required":false, + "description":"Set this flag to true if TDE encryption is enabled on source and target. Default value is false", + "example":"true" + }, + { + "propertyName":"jobNotificationType", + "required":false, + "description":"Email Notification for Falcon instance completion", + "example":"email" + }, + { + "propertyName":"jobNotificationReceivers", + "required":false, + "description":"Comma separated email Id's", + "example":"user1@gmail.com, user2@gmail.com" + } + ] +} \ No newline at end of file http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-secure-template.xml ---------------------------------------------------------------------- diff --git a/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-secure-template.xml b/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-secure-template.xml new file mode 100644 index 0000000..4497bb4 --- /dev/null +++ b/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-secure-template.xml @@ -0,0 +1,45 @@ + + + + + + + + + + + + + + 1 + + LAST_ONLY + ##jobFrequency## + ##jobTimezone## + + + + + + + + + + http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-secure-workflow.xml ---------------------------------------------------------------------- diff --git a/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-secure-workflow.xml b/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-secure-workflow.xml new file mode 100644 index 0000000..4bf048f --- /dev/null +++ b/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-secure-workflow.xml @@ -0,0 +1,363 @@ + + + + + + hcat.metastore.uri + ${sourceMetastoreUri} + + + hcat.metastore.principal + ${sourceHiveMetastoreKerberosPrincipal} + + + + + hcat.metastore.uri + ${targetMetastoreUri} + + + hcat.metastore.principal + ${targetHiveMetastoreKerberosPrincipal} + + + + + hive2.server.principal + ${sourceHive2KerberosPrincipal} + + + hive2.jdbc.url + jdbc:${sourceHiveServer2Uri}/${sourceDatabase} + + + + + hive2.server.principal + ${targetHive2KerberosPrincipal} + + + hive2.jdbc.url + jdbc:${targetHiveServer2Uri}/${sourceDatabase} + + + + + + + ${jobTracker} + ${nameNode} + + + oozie.launcher.mapreduce.job.user.classpath.first + true + + + mapred.job.queue.name + ${queueName} + + + oozie.launcher.mapred.job.priority + ${jobPriority} + + + oozie.use.system.libpath + true + + + oozie.action.sharelib.for.java + distcp,hive,hive2,hcatalog + + + oozie.launcher.mapreduce.job.hdfs-servers + ${sourceNN},${targetNN} + + + mapreduce.job.hdfs-servers + ${sourceNN},${targetNN} + + + org.apache.falcon.hive.HiveDRTool + -Dmapred.job.queue.name=${queueName} + -Dmapred.job.priority=${jobPriority} + -falconLibPath + ${wf:conf("falcon.libpath")} + -sourceCluster + ${sourceCluster} + -sourceMetastoreUri + ${sourceMetastoreUri} + -sourceHiveServer2Uri + ${sourceHiveServer2Uri} + -sourceDatabase + ${sourceDatabase} + -sourceTable + ${sourceTable} + -sourceStagingPath + ${sourceStagingPath} + -sourceNN + ${sourceNN} + -sourceNNKerberosPrincipal + ${sourceNNKerberosPrincipal} + -sourceHiveMetastoreKerberosPrincipal + ${sourceHiveMetastoreKerberosPrincipal} + -sourceHive2KerberosPrincipal + ${sourceHive2KerberosPrincipal} + -targetCluster + ${targetCluster} + -targetMetastoreUri + ${targetMetastoreUri} + -targetHiveServer2Uri + ${targetHiveServer2Uri} + -targetStagingPath + ${targetStagingPath} + -targetNN + ${targetNN} + -targetNNKerberosPrincipal + ${targetNNKerberosPrincipal} + -targetHiveMetastoreKerberosPrincipal + ${targetHiveMetastoreKerberosPrincipal} + -targetHive2KerberosPrincipal + ${targetHive2KerberosPrincipal} + -maxEvents + ${maxEvents} + -clusterForJobRun + ${clusterForJobRun} + -clusterForJobRunWriteEP + ${clusterForJobRunWriteEP} + -clusterForJobNNKerberosPrincipal + ${clusterForJobNNKerberosPrincipal} + -tdeEncryptionEnabled + ${tdeEncryptionEnabled} + -jobName + ${jobName}-${nominalTime} + -executionStage + lastevents + + + + + + + + ${jobTracker} + ${nameNode} + + + oozie.launcher.mapreduce.job.user.classpath.first + true + + + mapred.job.queue.name + ${queueName} + + + oozie.launcher.mapred.job.priority + ${jobPriority} + + + oozie.use.system.libpath + true + + + oozie.action.sharelib.for.java + distcp,hive,hive2,hcatalog + + + oozie.launcher.mapreduce.job.hdfs-servers + ${sourceNN},${targetNN} + + + mapreduce.job.hdfs-servers + ${sourceNN},${targetNN} + + + org.apache.falcon.hive.HiveDRTool + -Dmapred.job.queue.name=${queueName} + -Dmapred.job.priority=${jobPriority} + -falconLibPath + ${wf:conf("falcon.libpath")} + -replicationMaxMaps + ${replicationMaxMaps} + -distcpMaxMaps + ${distcpMaxMaps} + -sourceCluster + ${sourceCluster} + -sourceMetastoreUri + ${sourceMetastoreUri} + -sourceHiveServer2Uri + ${sourceHiveServer2Uri} + -sourceDatabase + ${sourceDatabase} + -sourceTable + ${sourceTable} + -sourceStagingPath + ${sourceStagingPath} + -sourceNN + ${sourceNN} + -sourceNNKerberosPrincipal + ${sourceNNKerberosPrincipal} + -sourceHiveMetastoreKerberosPrincipal + ${sourceHiveMetastoreKerberosPrincipal} + -sourceHive2KerberosPrincipal + ${sourceHive2KerberosPrincipal} + -targetCluster + ${targetCluster} + -targetMetastoreUri + ${targetMetastoreUri} + -targetHiveServer2Uri + ${targetHiveServer2Uri} + -targetStagingPath + ${targetStagingPath} + -targetNN + ${targetNN} + -targetNNKerberosPrincipal + ${targetNNKerberosPrincipal} + -targetHiveMetastoreKerberosPrincipal + ${targetHiveMetastoreKerberosPrincipal} + -targetHive2KerberosPrincipal + ${targetHive2KerberosPrincipal} + -maxEvents + ${maxEvents} + -distcpMapBandwidth + ${distcpMapBandwidth} + -clusterForJobRun + ${clusterForJobRun} + -clusterForJobRunWriteEP + ${clusterForJobRunWriteEP} + -clusterForJobNNKerberosPrincipal + ${clusterForJobNNKerberosPrincipal} + -tdeEncryptionEnabled + ${tdeEncryptionEnabled} + -jobName + ${jobName}-${nominalTime} + -executionStage + export + -counterLogDir + ${logDir}/job-${nominalTime}/${srcClusterName == 'NA' ? '' : srcClusterName}/ + + + + + + + + ${jobTracker} + ${nameNode} + + + oozie.launcher.mapreduce.job.user.classpath.first + true + + + mapred.job.queue.name + ${queueName} + + + oozie.launcher.mapred.job.priority + ${jobPriority} + + + oozie.use.system.libpath + true + + + oozie.action.sharelib.for.java + distcp,hive,hive2,hcatalog + + + oozie.launcher.mapreduce.job.hdfs-servers + ${sourceNN},${targetNN} + + + mapreduce.job.hdfs-servers + ${sourceNN},${targetNN} + + + org.apache.falcon.hive.HiveDRTool + -Dmapred.job.queue.name=${queueName} + -Dmapred.job.priority=${jobPriority} + -falconLibPath + ${wf:conf("falcon.libpath")} + -replicationMaxMaps + ${replicationMaxMaps} + -distcpMaxMaps + ${distcpMaxMaps} + -sourceCluster + ${sourceCluster} + -sourceMetastoreUri + ${sourceMetastoreUri} + -sourceHiveServer2Uri + ${sourceHiveServer2Uri} + -sourceDatabase + ${sourceDatabase} + -sourceTable + ${sourceTable} + -sourceStagingPath + ${sourceStagingPath} + -sourceNN + ${sourceNN} + -sourceNNKerberosPrincipal + ${sourceNNKerberosPrincipal} + -sourceHiveMetastoreKerberosPrincipal + ${sourceHiveMetastoreKerberosPrincipal} + -sourceHive2KerberosPrincipal + ${sourceHive2KerberosPrincipal} + -targetCluster + ${targetCluster} + -targetMetastoreUri + ${targetMetastoreUri} + -targetHiveServer2Uri + ${targetHiveServer2Uri} + -targetStagingPath + ${targetStagingPath} + -targetNN + ${targetNN} + -targetNNKerberosPrincipal + ${targetNNKerberosPrincipal} + -targetHiveMetastoreKerberosPrincipal + ${targetHiveMetastoreKerberosPrincipal} + -targetHive2KerberosPrincipal + ${targetHive2KerberosPrincipal} + -maxEvents + ${maxEvents} + -distcpMapBandwidth + ${distcpMapBandwidth} + -clusterForJobRun + ${clusterForJobRun} + -clusterForJobRunWriteEP + ${clusterForJobRunWriteEP} + -clusterForJobNNKerberosPrincipal + ${clusterForJobNNKerberosPrincipal} + -tdeEncryptionEnabled + ${tdeEncryptionEnabled} + -jobName + ${jobName}-${nominalTime} + -executionStage + import + + + + + + + Workflow action failed, error message[${wf:errorMessage(wf:lastErrorNode())}] + + + + http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-template.xml ---------------------------------------------------------------------- diff --git a/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-template.xml b/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-template.xml new file mode 100644 index 0000000..4497bb4 --- /dev/null +++ b/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-template.xml @@ -0,0 +1,45 @@ + + + + + + + + + + + + + + 1 + + LAST_ONLY + ##jobFrequency## + ##jobTimezone## + + + + + + + + + + http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-workflow.xml ---------------------------------------------------------------------- diff --git a/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-workflow.xml b/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-workflow.xml new file mode 100644 index 0000000..9f9bf92 --- /dev/null +++ b/addons/extensions/hive-mirroring/src/main/resources/runtime/hive-mirroring-workflow.xml @@ -0,0 +1,255 @@ + + + + + + ${jobTracker} + ${nameNode} + + + oozie.launcher.mapreduce.job.user.classpath.first + true + + + mapred.job.queue.name + ${queueName} + + + oozie.launcher.mapred.job.priority + ${jobPriority} + + + oozie.use.system.libpath + true + + + oozie.action.sharelib.for.java + distcp,hive,hive2,hcatalog + + + org.apache.falcon.hive.HiveDRTool + -Dmapred.job.queue.name=${queueName} + -Dmapred.job.priority=${jobPriority} + -falconLibPath + ${wf:conf("falcon.libpath")} + -sourceCluster + ${sourceCluster} + -sourceMetastoreUri + ${sourceMetastoreUri} + -sourceHiveServer2Uri + ${sourceHiveServer2Uri} + -sourceDatabase + ${sourceDatabase} + -sourceTable + ${sourceTable} + -sourceStagingPath + ${sourceStagingPath} + -sourceNN + ${sourceNN} + -targetCluster + ${targetCluster} + -targetMetastoreUri + ${targetMetastoreUri} + -targetHiveServer2Uri + ${targetHiveServer2Uri} + -targetStagingPath + ${targetStagingPath} + -targetNN + ${targetNN} + -maxEvents + ${maxEvents} + -clusterForJobRun + ${clusterForJobRun} + -clusterForJobRunWriteEP + ${clusterForJobRunWriteEP} + -tdeEncryptionEnabled + ${tdeEncryptionEnabled} + -jobName + ${jobName}-${nominalTime} + -executionStage + lastevents + + + + + + + + ${jobTracker} + ${nameNode} + + + oozie.launcher.mapreduce.job.user.classpath.first + true + + + mapred.job.queue.name + ${queueName} + + + oozie.launcher.mapred.job.priority + ${jobPriority} + + + oozie.use.system.libpath + true + + + oozie.action.sharelib.for.java + distcp,hive,hive2,hcatalog + + + org.apache.falcon.hive.HiveDRTool + -Dmapred.job.queue.name=${queueName} + -Dmapred.job.priority=${jobPriority} + -falconLibPath + ${wf:conf("falcon.libpath")} + -replicationMaxMaps + ${replicationMaxMaps} + -distcpMaxMaps + ${distcpMaxMaps} + -sourceCluster + ${sourceCluster} + -sourceMetastoreUri + ${sourceMetastoreUri} + -sourceHiveServer2Uri + ${sourceHiveServer2Uri} + -sourceDatabase + ${sourceDatabase} + -sourceTable + ${sourceTable} + -sourceStagingPath + ${sourceStagingPath} + -sourceNN + ${sourceNN} + -targetCluster + ${targetCluster} + -targetMetastoreUri + ${targetMetastoreUri} + -targetHiveServer2Uri + ${targetHiveServer2Uri} + -targetStagingPath + ${targetStagingPath} + -targetNN + ${targetNN} + -maxEvents + ${maxEvents} + -distcpMapBandwidth + ${distcpMapBandwidth} + -clusterForJobRun + ${clusterForJobRun} + -clusterForJobRunWriteEP + ${clusterForJobRunWriteEP} + -tdeEncryptionEnabled + ${tdeEncryptionEnabled} + -jobName + ${jobName}-${nominalTime} + -executionStage + export + -counterLogDir + ${logDir}/job-${nominalTime}/${srcClusterName == 'NA' ? '' : srcClusterName}/ + + + + + + + + ${jobTracker} + ${nameNode} + + + oozie.launcher.mapreduce.job.user.classpath.first + true + + + mapred.job.queue.name + ${queueName} + + + oozie.launcher.mapred.job.priority + ${jobPriority} + + + oozie.use.system.libpath + true + + + oozie.action.sharelib.for.java + distcp,hive,hive2,hcatalog + + + org.apache.falcon.hive.HiveDRTool + -Dmapred.job.queue.name=${queueName} + -Dmapred.job.priority=${jobPriority} + -falconLibPath + ${wf:conf("falcon.libpath")} + -replicationMaxMaps + ${replicationMaxMaps} + -distcpMaxMaps + ${distcpMaxMaps} + -sourceCluster + ${sourceCluster} + -sourceMetastoreUri + ${sourceMetastoreUri} + -sourceHiveServer2Uri + ${sourceHiveServer2Uri} + -sourceDatabase + ${sourceDatabase} + -sourceTable + ${sourceTable} + -sourceStagingPath + ${sourceStagingPath} + -sourceNN + ${sourceNN} + -targetCluster + ${targetCluster} + -targetMetastoreUri + ${targetMetastoreUri} + -targetHiveServer2Uri + ${targetHiveServer2Uri} + -targetStagingPath + ${targetStagingPath} + -targetNN + ${targetNN} + -maxEvents + ${maxEvents} + -distcpMapBandwidth + ${distcpMapBandwidth} + -clusterForJobRun + ${clusterForJobRun} + -clusterForJobRunWriteEP + ${clusterForJobRunWriteEP} + -tdeEncryptionEnabled + ${tdeEncryptionEnabled} + -jobName + ${jobName}-${nominalTime} + -executionStage + import + + + + + + + Workflow action failed, error message[${wf:errorMessage(wf:lastErrorNode())}] + + + + http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/hivedr/src/main/java/org/apache/falcon/hive/HiveDRArgs.java ---------------------------------------------------------------------- diff --git a/addons/hivedr/src/main/java/org/apache/falcon/hive/HiveDRArgs.java b/addons/hivedr/src/main/java/org/apache/falcon/hive/HiveDRArgs.java index c9ad47e..71b9043 100644 --- a/addons/hivedr/src/main/java/org/apache/falcon/hive/HiveDRArgs.java +++ b/addons/hivedr/src/main/java/org/apache/falcon/hive/HiveDRArgs.java @@ -32,7 +32,7 @@ public enum HiveDRArgs { SOURCE_HS2_URI("sourceHiveServer2Uri", "source HS2 uri"), SOURCE_DATABASE("sourceDatabase", "comma source databases"), SOURCE_TABLE("sourceTable", "comma source tables"), - SOURCE_STAGING_PATH("sourceStagingPath", "source staging path for data"), + SOURCE_STAGING_PATH("sourceStagingPath", "source staging path for data", false), // source hadoop endpoints SOURCE_NN("sourceNN", "source name node"), @@ -47,7 +47,7 @@ public enum HiveDRArgs { TARGET_METASTORE_URI("targetMetastoreUri", "source meta store uri"), TARGET_HS2_URI("targetHiveServer2Uri", "source meta store uri"), - TARGET_STAGING_PATH("targetStagingPath", "source staging path for data"), + TARGET_STAGING_PATH("targetStagingPath", "source staging path for data", false), // target hadoop endpoints TARGET_NN("targetNN", "target name node"), @@ -70,16 +70,13 @@ public enum HiveDRArgs { // Map Bandwidth DISTCP_MAP_BANDWIDTH("distcpMapBandwidth", "map bandwidth in mb", false), - JOB_NAME("drJobName", "unique job name"), + JOB_NAME("jobName", "unique job name"), CLUSTER_FOR_JOB_RUN("clusterForJobRun", "cluster where job runs"), JOB_CLUSTER_NN("clusterForJobRunWriteEP", "write end point of cluster where job runs"), JOB_CLUSTER_NN_KERBEROS_PRINCIPAL("clusterForJobNNKerberosPrincipal", "Namenode kerberos principal of cluster on which replication job runs", false), - - FALCON_LIBPATH("falconLibPath", "Falcon Lib Path for Jar files", false), - KEEP_HISTORY("keepHistory", "Keep history of events file generated", false), EXECUTION_STAGE("executionStage", "Flag for workflow stage execution", false), COUNTER_LOGDIR("counterLogDir", "Log directory to store counter file", false); http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/hivedr/src/main/java/org/apache/falcon/hive/HiveDROptions.java ---------------------------------------------------------------------- diff --git a/addons/hivedr/src/main/java/org/apache/falcon/hive/HiveDROptions.java b/addons/hivedr/src/main/java/org/apache/falcon/hive/HiveDROptions.java index 868ec8d..0096727 100644 --- a/addons/hivedr/src/main/java/org/apache/falcon/hive/HiveDROptions.java +++ b/addons/hivedr/src/main/java/org/apache/falcon/hive/HiveDROptions.java @@ -24,7 +24,7 @@ import org.apache.commons.cli.Option; import org.apache.commons.cli.Options; import org.apache.commons.cli.ParseException; import org.apache.commons.lang3.StringUtils; -import org.apache.falcon.hive.exception.HiveReplicationException; +import org.apache.falcon.hive.util.FileUtils; import java.io.File; import java.util.Arrays; @@ -70,11 +70,14 @@ public class HiveDROptions { return Arrays.asList(context.get(HiveDRArgs.SOURCE_TABLE).trim().split(",")); } - public String getSourceStagingPath() throws HiveReplicationException { - if (StringUtils.isNotEmpty(context.get(HiveDRArgs.SOURCE_STAGING_PATH))) { - return context.get(HiveDRArgs.SOURCE_STAGING_PATH) + File.separator + getJobName(); + public String getSourceStagingPath() { + String stagingPath = context.get(HiveDRArgs.SOURCE_STAGING_PATH); + if (StringUtils.isNotBlank(stagingPath)) { + stagingPath = StringUtils.removeEnd(stagingPath, File.separator); + return stagingPath + File.separator + getJobName(); + } else { + return FileUtils.DEFAULT_EVENT_STORE_PATH + getJobName(); } - throw new HiveReplicationException("Source StagingPath cannot be empty"); } public String getSourceWriteEP() { @@ -100,15 +103,19 @@ public class HiveDROptions { public String getTargetMetastoreKerberosPrincipal() { return context.get(HiveDRArgs.TARGET_HIVE_METASTORE_KERBEROS_PRINCIPAL); } + public String getTargetHive2KerberosPrincipal() { return context.get(HiveDRArgs.TARGET_HIVE2_KERBEROS_PRINCIPAL); } - public String getTargetStagingPath() throws HiveReplicationException { - if (StringUtils.isNotEmpty(context.get(HiveDRArgs.TARGET_STAGING_PATH))) { - return context.get(HiveDRArgs.TARGET_STAGING_PATH) + File.separator + getJobName(); + public String getTargetStagingPath() { + String stagingPath = context.get(HiveDRArgs.TARGET_STAGING_PATH); + if (StringUtils.isNotBlank(stagingPath)) { + stagingPath = StringUtils.removeEnd(stagingPath, File.separator); + return stagingPath + File.separator + getJobName(); + } else { + return FileUtils.DEFAULT_EVENT_STORE_PATH + getJobName(); } - throw new HiveReplicationException("Target StagingPath cannot be empty"); } public String getReplicationMaxMaps() { @@ -135,23 +142,10 @@ public class HiveDROptions { return context.get(HiveDRArgs.JOB_CLUSTER_NN_KERBEROS_PRINCIPAL); } - public void setSourceStagingDir(String path) { - context.put(HiveDRArgs.SOURCE_STAGING_PATH, path); - } - - public void setTargetStagingDir(String path) { - context.put(HiveDRArgs.TARGET_STAGING_PATH, path); - } - public String getExecutionStage() { return context.get(HiveDRArgs.EXECUTION_STAGE); } - public boolean isTDEEncryptionEnabled() { - return StringUtils.isEmpty(context.get(HiveDRArgs.TDE_ENCRYPTION_ENABLED)) - ? false : Boolean.valueOf(context.get(HiveDRArgs.TDE_ENCRYPTION_ENABLED)); - } - public boolean shouldBlock() { return true; } http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/recipes/hdfs-replication/README.txt ---------------------------------------------------------------------- diff --git a/addons/recipes/hdfs-replication/README.txt b/addons/recipes/hdfs-replication/README.txt deleted file mode 100644 index 5742d43..0000000 --- a/addons/recipes/hdfs-replication/README.txt +++ /dev/null @@ -1,29 +0,0 @@ -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -HDFS Directory Replication Recipe - -Overview -This recipe implements replicating arbitrary directories on HDFS from one -Hadoop cluster to another Hadoop cluster. -This piggy backs on replication solution in Falcon which uses the DistCp tool. - -Use Case -* Copy directories between HDFS clusters with out dated partitions -* Archive directories from HDFS to Cloud. Ex: S3, Azure WASB - -Limitations -As the data volume and number of files grow, this can get inefficient. http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/recipes/hdfs-replication/pom.xml ---------------------------------------------------------------------- diff --git a/addons/recipes/hdfs-replication/pom.xml b/addons/recipes/hdfs-replication/pom.xml deleted file mode 100644 index 98d9795..0000000 --- a/addons/recipes/hdfs-replication/pom.xml +++ /dev/null @@ -1,32 +0,0 @@ - - - - - - - 4.0.0 - org.apache.falcon.recipes - falcon-hdfs-replication-recipe - 0.10-SNAPSHOT - Apache Falcon Sample Hdfs Replicaiton Recipe - Apache Falcon Sample Hdfs Replication Recipe - jar - http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/recipes/hdfs-replication/src/main/resources/hdfs-replication-template.xml ---------------------------------------------------------------------- diff --git a/addons/recipes/hdfs-replication/src/main/resources/hdfs-replication-template.xml b/addons/recipes/hdfs-replication/src/main/resources/hdfs-replication-template.xml deleted file mode 100644 index 441a189..0000000 --- a/addons/recipes/hdfs-replication/src/main/resources/hdfs-replication-template.xml +++ /dev/null @@ -1,44 +0,0 @@ - - - - - - - - - - - - _falcon_mirroring_type=HDFS - - 1 - - LAST_ONLY - ##falcon.recipe.frequency## - UTC - - - - - - - - - - \ No newline at end of file http://git-wip-us.apache.org/repos/asf/falcon/blob/95bf312f/addons/recipes/hdfs-replication/src/main/resources/hdfs-replication-workflow.xml ---------------------------------------------------------------------- diff --git a/addons/recipes/hdfs-replication/src/main/resources/hdfs-replication-workflow.xml b/addons/recipes/hdfs-replication/src/main/resources/hdfs-replication-workflow.xml deleted file mode 100644 index c1966be..0000000 --- a/addons/recipes/hdfs-replication/src/main/resources/hdfs-replication-workflow.xml +++ /dev/null @@ -1,82 +0,0 @@ - - - - - - - ${jobTracker} - ${nameNode} - - - oozie.launcher.mapreduce.job.user.classpath.first - true - - - mapred.job.queue.name - ${queueName} - - - oozie.launcher.mapred.job.priority - ${jobPriority} - - - oozie.use.system.libpath - true - - - oozie.action.sharelib.for.java - distcp - - - oozie.launcher.oozie.libpath - ${wf:conf("falcon.libpath")} - - - oozie.launcher.mapreduce.job.hdfs-servers - ${drSourceClusterFS},${drTargetClusterFS} - - - org.apache.falcon.replication.FeedReplicator - -Dmapred.job.queue.name=${queueName} - -Dmapred.job.priority=${jobPriority} - -maxMaps - ${distcpMaxMaps} - -mapBandwidth - ${distcpMapBandwidth} - -sourcePaths - ${drSourceDir} - -targetPath - ${drTargetClusterFS}${drTargetDir} - -falconFeedStorageType - FILESYSTEM - -availabilityFlag - ${availabilityFlag == 'NA' ? "NA" : availabilityFlag} - -counterLogDir - ${logDir}/job-${nominalTime}/${srcClusterName == 'NA' ? '' : srcClusterName} - - - - - - - Workflow action failed, error message[${wf:errorMessage(wf:lastErrorNode())}] - - - -