falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ajay Yadava (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (FALCON-1149) The 'today' EL date expression is resolving to yesterday's date, for process instance input feed ranges
Date Fri, 10 Apr 2015 12:00:18 GMT

     [ https://issues.apache.org/jira/browse/FALCON-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ajay Yadava reassigned FALCON-1149:
-----------------------------------

    Assignee: Ajay Yadava

> The 'today' EL date expression is resolving to yesterday's date, for process instance
input feed ranges
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FALCON-1149
>                 URL: https://issues.apache.org/jira/browse/FALCON-1149
>             Project: Falcon
>          Issue Type: Bug
>    Affects Versions: 0.5, 0.6
>         Environment: HDP 2.1 sandbox, HDP 2.2 sandbox; server in UTC
>            Reporter: Alex C
>            Assignee: Ajay Yadava
>
> *Steps to reproduce* 
> 1. Submit a cluster named 'sandbox':
> {code:xml}
> <cluster colo="local" description="Sandbox Cluster" name="sandbox" xmlns="uri:falcon:cluster:0.1">
>   <interfaces>
>     <interface type="readonly" endpoint="hftp://sandbox.hortonworks.com:50070" version="2.2.0"
/>
>     <interface type="write" endpoint="hdfs://sandbox.hortonworks.com:8020" version="2.2.0"
/>
>     <interface type="execute" endpoint="sandbox.hortonworks.com:8050" version="2.2.0"
/>
>     <interface type="workflow" endpoint="http://sandbox.hortonworks.com:11000/oozie/"
version="4.0.0" />
>     <interface type="messaging" endpoint="tcp://sandbox.hortonworks.com:61616?daemon=true"
version="5.1.6" />
>   </interfaces>
>   <locations>
>     <location name="staging" path="/apps/falcon/sandbox/staging" />
>     <location name="temp" path="/tmp" />
>     <location name="working" path="/apps/falcon/sandbox/working" />
>   </locations>
> </cluster>
> {code}
> 2. Submit a feed f1:
> {code:xml}
> <feed name="f1" description="f1" xmlns="uri:falcon:feed:0.1">
>   <frequency>days(1)</frequency>
>   <timezone>UTC</timezone>
>   <late-arrival cut-off="hours(48)" />
>   <clusters>
>     <cluster name="sandbox" type="source">
>       <validity start="2013-01-01T13:00Z" end="2099-12-31T13:00Z" />
>       <retention limit="months(9999)" action="delete" />
>     </cluster>
>   </clusters>
>   <locations>
>     <location type="data"
>       path="/f1/${YEAR}/${MONTH}/${DAY}" />
>   </locations>
>   <ACL owner="ambari-qa" group="users" permission="0775" />
>   <schema location="/none" provider="none" />
> </feed>
> {code}
> 3. Submit a process p1:
> {code:xml}
> <process name="p1" xmlns="uri:falcon:process:0.1">
>   <clusters>
>     <cluster name="sandbox">
>       <validity start="<TODAY>T08:30Z" end="2099-12-31T00:00Z"/>
>     </cluster>
>   </clusters>
>   <parallel>1</parallel>
>   <order>FIFO</order>
>   <frequency>days(1)</frequency>
>   <outputs>
>     <output name="output" feed="f1" instance="today(0,0)" />
>   </outputs>
>   <properties>
>   </properties>
>   <workflow name="p1-wf" engine="oozie" path="/apps/p1" />
>   <retry policy="periodic" delay="minutes(60)" attempts="24" />
> </process>
> {code}
> 4. Submit a feed f2:
> {code:xml}
> <feed name="f2" description="f2" xmlns="uri:falcon:feed:0.1">
>   <frequency>days(1)</frequency>
>   <timezone>UTC</timezone>
>   <late-arrival cut-off="hours(48)" />
>   <clusters>
>     <cluster name="sandbox" type="source">
>       <validity start="2013-01-01T13:00Z" end="2099-12-31T13:00Z" />
>       <retention limit="months(9999)" action="delete" />
>     </cluster>
>   </clusters>
>   <locations>
>     <location type="data"
>       path="/f2/${YEAR}/${MONTH}/${DAY}" />
>   </locations>
>   <ACL owner="ambari-qa" group="users" permission="0775" />
>   <schema location="/none" provider="none" />
> </feed>
> {code}
> 5. Submit a process p2:
> {code:xml}
> <process name="p2" xmlns="uri:falcon:process:0.1">
>   <clusters>
>     <cluster name="sandbox">
>       <validity start="<TODAY>T08:30Z" end="2099-12-31T00:00Z"/>
>     </cluster>
>   </clusters>
>   <parallel>1</parallel>
>   <order>FIFO</order>
>   <frequency>days(1)</frequency>
>   <inputs>
>     <input name="input" feed="f1" start="today(0,0)" end="today(0,0)" />
>   </inputs>
>   <outputs>
>     <output name="output" feed="f2" instance="today(0,0)" />
>   </outputs>
>   <workflow name="p2-wf" engine="oozie" path="/apps/p2" />
>   <retry policy="periodic" delay="minutes(60)" attempts="24" />
> </process>
> {code}
> 6. Note that:
> - Process p1 has no input feed (the data is fetched from some other location by p1).
> - Feed f1 is referenced in the output of p1, and also referenced in the input of p2.
> - All feeds are daily, and process input feed ranges and output feeds are daily, by way
of the 'today(0,0)' EL expression.
> 7. Finally, schedule all feeds and processes after 08:30Z on a given day, 'today'..
> *Expected:*
> 1. The first scheduled instance for p1 proceeds to COMPLETED, and produces a partition
in f1 for 'today'
> 2. The first scheduled instance for p2 proceeds to COMPLETED, and produces a partition
in f2 for 'today', since it looks for and finds a corresponding partition for 'today' in f1.
> *Actual:*
> 1. The first scheduled instance for p1 proceeds to COMPLETED, and produces a partition
in f1 for 'today'
> 2. However, the first scheduled instance for p2 is left in WAITING state, since it is
looking for a partition in f1 for 'yesterday', which does not exist (and will never exist).
> I am currently working around this unexpected behaviour by specifying the input feed
range start and end for p2 as 'today(24,0)' instead of 'today(0,0)'
> Please advise if this is indeed a) a bug or b) a mistake in the configuration.
> Many thanks,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message