falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srikanth Sundarrajan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-93) Replication to handle hive table replication
Date Thu, 10 Oct 2013 08:16:42 GMT

    [ https://issues.apache.org/jira/browse/FALCON-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791301#comment-13791301
] 

Srikanth Sundarrajan commented on FALCON-93:
--------------------------------------------

Can prefix be "falcon.source." & "falcon.target." instead of just source & target?
{code}
+                    propagateStorageProperties(srcCluster, (CatalogStorage) sourceStorage,
props, "source");
+                    propagateStorageProperties(trgCluster, (CatalogStorage) targetStorage,
props, "target");
{code}

Looks like all tables go through the same export path. Can export & import be avoided
for external tables or does the export/import already take care of the fact that table is
external and allow you to short-circuit this?
{code}
+    <decision name="replication-decision">
+        <switch>
+            <case to="table-export">
+                ${feedStorageType == "TABLE"}
+            </case>
+            <default to="replication"/>
+        </switch>
+    </decision>
{code}

Seems to be using distcp-v1. This is not desirable.
{code}
+    <!-- Table Replication - Import data and metadata from HDFS Staging into Target Hive
-->
+    <action name="table-replication">
+        <distcp xmlns="uri:oozie:distcp-action:0.1">
+            <job-tracker>${targetJobTracker}</job-tracker>
+            <name-node>${targetNameNode}</name-node>
+            <configuration>
+                <property>
+                    <name>mapred.job.queue.name</name>
+                    <value>${queueName}</value>
+                </property>
+            </configuration>
+            <arg>${sourceStagingDir}/${nominalTime}</arg>
+            <arg>${targetStagingDir}/${nominalTime}</arg>
+        </distcp>
+        <ok to="table-import"/>
+        <error to="fail"/>
+    </action>
{code}

Looks like scenario where data from multiple sources each owning a partition getting merged
in the target cluster isn't implemented, as the export need to be specific to the partition
against each of the source cluster. Please confirm.

> Replication to handle hive table replication
> --------------------------------------------
>
>                 Key: FALCON-93
>                 URL: https://issues.apache.org/jira/browse/FALCON-93
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkatesh Seetharam
>         Attachments: FALCON-93.patch, FALCON-93-r1.patch
>
>
> Data and metadata to be replicated atomically.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message