falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pragya Mittal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1807) Late Rerun is not working in distributed mode
Date Wed, 03 Feb 2016 06:55:39 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15129922#comment-15129922
] 

Pragya Mittal commented on FALCON-1807:
---------------------------------------

Attaching all required definitions :
Process :
{noformat}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="ProcessLateRerunTest-agregator-coord16-bb54f97c" xmlns="uri:falcon:process:0.1">
    <clusters>
        <cluster name="ProcessLateRerunTest-corp-bf31e225">
            <validity start="2016-02-03T06:48Z" end="2016-02-03T07:18Z"/>
        </cluster>
    </clusters>
    <parallel>2</parallel>
    <order>FIFO</order>
    <frequency>minutes(5)</frequency>
    <timezone>UTC</timezone>
    <inputs>
        <input name="inputData" feed="ProcessLateRerunTest-raaw-logs16-3d7c1e49" start="now(0,-1)"
end="now(0,0)"/>
    </inputs>
    <outputs>
        <output name="outputData" feed="ProcessLateRerunTest-agregated-logs16-975a0d4c"
instance="now(0,0)"/>
    </outputs>
    <properties>
        <property name="queueName" value="default"/>
    </properties>
    <workflow path="/tmp/falcon-regression/ProcessLateRerunTest/aggregator"/>
    <retry policy="periodic" delay="minutes(10)" attempts="3"/>
    <late-process policy="periodic" delay="minutes(4)">
        <late-input input="inputData" workflow-path="/tmp/falcon-regression/ProcessLateRerunTest/aggregator"/>
    </late-process>
    <ACL owner="pragya" group="dataqa" permission="*"/>
</process>
{noformat}

Feed1 :
{noformat}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed name="ProcessLateRerunTest-agregated-logs16-975a0d4c" description="clicks log" xmlns="uri:falcon:feed:0.1">
    <frequency>minutes(5)</frequency>
    <timezone>UTC</timezone>
    <late-arrival cut-off="hours(6)"/>
    <clusters>
        <cluster name="ProcessLateRerunTest-corp-bf31e225" type="source">
            <validity start="2009-01-01T01:00Z" end="2099-12-31T23:59Z"/>
            <retention limit="months(6)" action="delete"/>
        </cluster>
    </clusters>
    <locations>
        <location type="data" path="/tmp/falcon-regression/ProcessLateRerunTest/output-data/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/>
        <location type="stats" path="/projects/falcon/clicksStats"/>
        <location type="meta" path="/projects/falcon/clicksMetaData"/>
    </locations>
    <ACL owner="pragya" group="dataqa" permission="*"/>
    <schema location="/schema/clicks" provider="protobuf"/>
    <properties>
        <property name="field5" value="value1"/>
        <property name="field6" value="value2"/>
    </properties>
</feed>
{noformat}

Feed2 :
{noformat}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed name="ProcessLateRerunTest-raaw-logs16-3d7c1e49" description="clicks log" xmlns="uri:falcon:feed:0.1">
    <frequency>minutes(1)</frequency>
    <timezone>UTC</timezone>
    <late-arrival cut-off="hours(6)"/>
    <clusters>
        <cluster name="ProcessLateRerunTest-corp-bf31e225" type="source">
            <validity start="2009-01-01T00:00Z" end="2099-12-31T23:59Z"/>
            <retention limit="months(6)" action="delete"/>
        </cluster>
    </clusters>
    <locations>
        <location type="data" path="/tmp/falcon-regression/ProcessLateRerunTest/input/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/>
        <location type="stats" path="/projects/falcon/clicksStats"/>
        <location type="meta" path="/projects/falcon/clicksMetaData"/>
    </locations>
    <ACL owner="pragya" group="dataqa" permission="*"/>
    <schema location="/schema/clicks" provider="protobuf"/>
    <properties>
        <property name="field3" value="value1"/>
        <property name="field4" value="value2"/>
    </properties>
</feed>
{noformat}

Workflow :
{noformat}
<workflow-app xmlns="uri:oozie:workflow:0.2" name="aggregator-wf">
    <start to="aggregator"/>
    <action name="aggregator">
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${outputData}"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <property>
                    <name>mapred.mapper.class</name>
                    <value>org.apache.hadoop.mapred.lib.IdentityMapper</value>
                </property>
                <property>
                    <name>mapred.reducer.class</name>
                    <value>org.apache.hadoop.mapred.lib.IdentityReducer</value>
                </property>
                <property>
                    <name>mapred.map.tasks</name>
                    <value>1</value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>${inputData}</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>${outputData}</value>
                </property>
            </configuration>
        </map-reduce>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>
{noformat}

> Late Rerun is not working in distributed mode
> ---------------------------------------------
>
>                 Key: FALCON-1807
>                 URL: https://issues.apache.org/jira/browse/FALCON-1807
>             Project: Falcon
>          Issue Type: Bug
>          Components: rerun
>    Affects Versions: 0.9
>            Reporter: Pragya Mittal
>            Assignee: sandeep samudrala
>            Priority: Blocker
>
> Ideally late rerun, runs the instance if and when the data becomes available in the late
rerun zone. This is not happening currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message