falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karishma Gulati (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FALCON-234) Replication feed not respecting partition filter instead copying all data to target
Date Mon, 28 Apr 2014 11:09:15 GMT

    [ https://issues.apache.org/jira/browse/FALCON-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982912#comment-13982912
] 

Karishma Gulati edited comment on FALCON-234 at 4/28/14 11:07 AM:
------------------------------------------------------------------

Tested this with the following feed.xml:

{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="uri:falcon:feed:0.1" name="feed-replication" description="clicks log">
  <partitions>
    <partition name="colo"/>
  </partitions>
  <frequency>minutes(5)</frequency>
  <timezone>UTC</timezone>
  <late-arrival cut-off="hours(1)"/>
  <clusters>
    <cluster name="sourceCluster" type="source">
      <validity start="2014-04-28T09:15Z" end="2099-01-01T00:00Z"/>
      <retention limit="hours(100000)" action="delete"/>
    </cluster>
    <cluster name="targetCluster" type="target" partition="${cluster.colo}" >
      <validity start="2014-04-28T09:15Z" end="2099-01-01T00:00Z"/>
      <retention limit="hours(100000)" action="delete"/>
    </cluster>    
  </clusters>
  <locations>
    <location type="data" path="/tmp/falcon-regression/input-data/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/>
    <location type="stats" path="/projects/ivory/clicksStats"/>
    <location type="meta" path="/projects/ivory/clicksMetaData"/>
  </locations>
  <ACL owner="testuser" group="group" permission="0x755"/>
  <schema location="/schema/clicks" provider="protobuf"/>
  <properties>
    <property name="field1" value="value1"/>
    <property name="field2" value="value2"/>
  </properties>
</feed>
{code}

The source colo was abc and the target colo was pqr.

PFA the data on source cluster, and on target cluster after replication (file: data_results).

>From that, looks like it works fine!







was (Author: karishmag9):
Tested this with the following feed.xml:

{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="uri:falcon:feed:0.1" name="feed-replication" description="clicks log">
  <partitions>
    <partition name="colo"/>
  </partitions>
  <frequency>minutes(5)</frequency>
  <timezone>UTC</timezone>
  <late-arrival cut-off="hours(1)"/>
  <clusters>
    <cluster name="sourceCluster" type="source">
      <validity start="2014-04-28T09:15Z" end="2099-01-01T00:00Z"/>
      <retention limit="hours(100000)" action="delete"/>
    </cluster>
    <cluster name="targetCluster" type="target" partition="${cluster.colo}" >
      <validity start="2014-04-28T09:15Z" end="2099-01-01T00:00Z"/>
      <retention limit="hours(100000)" action="delete"/>
    </cluster>    
  </clusters>
  <locations>
    <location type="data" path="/tmp/falcon-regression/input-data/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}/"/>
    <location type="stats" path="/projects/ivory/clicksStats"/>
    <location type="meta" path="/projects/ivory/clicksMetaData"/>
  </locations>
  <ACL owner="testuser" group="group" permission="0x755"/>
  <schema location="/schema/clicks" provider="protobuf"/>
  <properties>
    <property name="field1" value="value1"/>
    <property name="field2" value="value2"/>
  </properties>
</feed>
{code}

The source colo was abc and the target colo was pqr.

PFA the data on source cluster, and on target cluster after replication (file: data_results).

>From that, looks like it works fine!






> Replication feed not respecting partition filter instead copying all data to target
> -----------------------------------------------------------------------------------
>
>                 Key: FALCON-234
>                 URL: https://issues.apache.org/jira/browse/FALCON-234
>             Project: Falcon
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: 0.4
>            Reporter: Arpit Gupta
>            Assignee: Venkatesh Seetharam
>         Attachments: data_results, feed.xml, oozie-job-conf.txt, yarn-actual-map-task.txt,
yarn-oozie-launcher-stdout.txt, yarn-oozie-launcher-syslog.txt
>
>
> A replication feed was setup where the source was
> cluster-1:8020/data with partition set to ua1. The feed instead of copying over just
cluster-1:8020/data/ua1 copied over cluster-1:8020/data.
> This was seen on a cluster running Hadoop 2.2.0 and Oozie 4.0.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message