falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex C (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FALCON-1149) The 'today' EL date expression is resolving to yesterday's date, for process instance input feed ranges
Date Mon, 13 Apr 2015 13:01:12 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492373#comment-14492373
] 

Alex C edited comment on FALCON-1149 at 4/13/15 1:00 PM:
---------------------------------------------------------

Hi there,

Just to let you know, I tried you suggestion of only using 'today(24,0)' for the end, and
unfortunately it doesn't work (I still get the same error where the input is WAITING state).

Also, unfortunately the workaround I am using doesn't quite produce the desired results; although
it resolves the WAITING problem, and p2 can proceed to COMPLETED, consider the partition location
specified in f2:

{code}
/f2/${YEAR}/${MONTH}/${DAY}
{code}

I tested again, and unfortunately the day written is yesterday instead of today.. I suspect
this is due to the same bug?

Would you happen to know if a workaround is also possible for the feed location?

Thanks


was (Author: alza):
Hi there,

Just to let you know, I tried you suggestion of only using 'today(24,0)' for the end, and
unfortunately it doesn't work (I still get the same error where the input is WAITING state).

Also, unfortunately the workaround I am using doesn't quite produce the desired results..
although it resolves the WAITING problem, and p2 can proceed to COMPLETED, the partition specification
in f2 {{/f2/${YEAR}/${MONTH}/${DAY}}} means that the day written is yesterday instead of today.

Would you happen to know if a workaround is also possible for the specification '${DAY}'?

Thanks

> The 'today' EL date expression is resolving to yesterday's date, for process instance
input feed ranges
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FALCON-1149
>                 URL: https://issues.apache.org/jira/browse/FALCON-1149
>             Project: Falcon
>          Issue Type: Bug
>    Affects Versions: 0.5, 0.6
>         Environment: HDP 2.1 sandbox, HDP 2.2 sandbox; server in UTC
>            Reporter: Alex C
>            Assignee: Ajay Yadava
>
> *Steps to reproduce* 
> 1. Submit a cluster named 'sandbox':
> {code:xml}
> <cluster colo="local" description="Sandbox Cluster" name="sandbox" xmlns="uri:falcon:cluster:0.1">
>   <interfaces>
>     <interface type="readonly" endpoint="hftp://sandbox.hortonworks.com:50070" version="2.2.0"
/>
>     <interface type="write" endpoint="hdfs://sandbox.hortonworks.com:8020" version="2.2.0"
/>
>     <interface type="execute" endpoint="sandbox.hortonworks.com:8050" version="2.2.0"
/>
>     <interface type="workflow" endpoint="http://sandbox.hortonworks.com:11000/oozie/"
version="4.0.0" />
>     <interface type="messaging" endpoint="tcp://sandbox.hortonworks.com:61616?daemon=true"
version="5.1.6" />
>   </interfaces>
>   <locations>
>     <location name="staging" path="/apps/falcon/sandbox/staging" />
>     <location name="temp" path="/tmp" />
>     <location name="working" path="/apps/falcon/sandbox/working" />
>   </locations>
> </cluster>
> {code}
> 2. Submit a feed f1:
> {code:xml}
> <feed name="f1" description="f1" xmlns="uri:falcon:feed:0.1">
>   <frequency>days(1)</frequency>
>   <timezone>UTC</timezone>
>   <late-arrival cut-off="hours(48)" />
>   <clusters>
>     <cluster name="sandbox" type="source">
>       <validity start="2013-01-01T13:00Z" end="2099-12-31T13:00Z" />
>       <retention limit="months(9999)" action="delete" />
>     </cluster>
>   </clusters>
>   <locations>
>     <location type="data"
>       path="/f1/${YEAR}/${MONTH}/${DAY}" />
>   </locations>
>   <ACL owner="ambari-qa" group="users" permission="0775" />
>   <schema location="/none" provider="none" />
> </feed>
> {code}
> 3. Submit a process p1:
> {code:xml}
> <process name="p1" xmlns="uri:falcon:process:0.1">
>   <clusters>
>     <cluster name="sandbox">
>       <validity start="<TODAY>T08:30Z" end="2099-12-31T00:00Z"/>
>     </cluster>
>   </clusters>
>   <parallel>1</parallel>
>   <order>FIFO</order>
>   <frequency>days(1)</frequency>
>   <outputs>
>     <output name="output" feed="f1" instance="today(0,0)" />
>   </outputs>
>   <properties>
>   </properties>
>   <workflow name="p1-wf" engine="oozie" path="/apps/p1" />
>   <retry policy="periodic" delay="minutes(60)" attempts="24" />
> </process>
> {code}
> 4. Submit a feed f2:
> {code:xml}
> <feed name="f2" description="f2" xmlns="uri:falcon:feed:0.1">
>   <frequency>days(1)</frequency>
>   <timezone>UTC</timezone>
>   <late-arrival cut-off="hours(48)" />
>   <clusters>
>     <cluster name="sandbox" type="source">
>       <validity start="2013-01-01T13:00Z" end="2099-12-31T13:00Z" />
>       <retention limit="months(9999)" action="delete" />
>     </cluster>
>   </clusters>
>   <locations>
>     <location type="data"
>       path="/f2/${YEAR}/${MONTH}/${DAY}" />
>   </locations>
>   <ACL owner="ambari-qa" group="users" permission="0775" />
>   <schema location="/none" provider="none" />
> </feed>
> {code}
> 5. Submit a process p2:
> {code:xml}
> <process name="p2" xmlns="uri:falcon:process:0.1">
>   <clusters>
>     <cluster name="sandbox">
>       <validity start="<TODAY>T08:30Z" end="2099-12-31T00:00Z"/>
>     </cluster>
>   </clusters>
>   <parallel>1</parallel>
>   <order>FIFO</order>
>   <frequency>days(1)</frequency>
>   <inputs>
>     <input name="input" feed="f1" start="today(0,0)" end="today(0,0)" />
>   </inputs>
>   <outputs>
>     <output name="output" feed="f2" instance="today(0,0)" />
>   </outputs>
>   <workflow name="p2-wf" engine="oozie" path="/apps/p2" />
>   <retry policy="periodic" delay="minutes(60)" attempts="24" />
> </process>
> {code}
> 6. Note that:
> - Process p1 has no input feed (the data is fetched from some other location by p1).
> - Feed f1 is referenced in the output of p1, and also referenced in the input of p2.
> - All feeds are daily, and process input feed ranges and output feeds are daily, by way
of the 'today(0,0)' EL expression.
> 7. Finally, schedule all feeds and processes after 08:30Z on a given day, 'today'..
> *Expected:*
> 1. The first scheduled instance for p1 proceeds to COMPLETED, and produces a partition
in f1 for 'today'
> 2. The first scheduled instance for p2 proceeds to COMPLETED, and produces a partition
in f2 for 'today', since it looks for and finds a corresponding partition for 'today' in f1.
> *Actual:*
> 1. The first scheduled instance for p1 proceeds to COMPLETED, and produces a partition
in f1 for 'today'
> 2. However, the first scheduled instance for p2 is left in WAITING state, since it is
looking for a partition in f1 for 'yesterday', which does not exist (and will never exist).
> I am currently working around this unexpected behaviour by specifying the input feed
range start and end for p2 as 'today(24,0)' instead of 'today(0,0)'
> Please advise if this is indeed a) a bug or b) a mistake in the configuration.
> Many thanks,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message