falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mass Dosage (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1149) The 'today' EL date expression is resolving to yesterday's date, for process instance input feed ranges
Date Tue, 28 Apr 2015 11:47:06 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516896#comment-14516896
] 

Mass Dosage commented on FALCON-1149:
-------------------------------------

We looked at the "oozie-site.xml" and "oozie-default.xml" files in our production environment
and also in the sandbox we used to reproduce this and the "oozie.processing.timezone" property
is not being set at all. My understanding is that the absence of this property should result
in the value being resolved to the Oozie default value which according to the Oozie documentation
is UTC.

> The 'today' EL date expression is resolving to yesterday's date, for process instance
input feed ranges
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FALCON-1149
>                 URL: https://issues.apache.org/jira/browse/FALCON-1149
>             Project: Falcon
>          Issue Type: Bug
>    Affects Versions: 0.5, 0.6
>         Environment: HDP 2.1 sandbox, HDP 2.2 sandbox; server in UTC
>            Reporter: Alex C
>            Assignee: Ajay Yadava
>
> *Steps to reproduce* 
> 1. Submit a cluster named 'sandbox':
> {code:xml}
> <cluster colo="local" description="Sandbox Cluster" name="sandbox" xmlns="uri:falcon:cluster:0.1">
>   <interfaces>
>     <interface type="readonly" endpoint="hftp://sandbox.hortonworks.com:50070" version="2.2.0"
/>
>     <interface type="write" endpoint="hdfs://sandbox.hortonworks.com:8020" version="2.2.0"
/>
>     <interface type="execute" endpoint="sandbox.hortonworks.com:8050" version="2.2.0"
/>
>     <interface type="workflow" endpoint="http://sandbox.hortonworks.com:11000/oozie/"
version="4.0.0" />
>     <interface type="messaging" endpoint="tcp://sandbox.hortonworks.com:61616?daemon=true"
version="5.1.6" />
>   </interfaces>
>   <locations>
>     <location name="staging" path="/apps/falcon/sandbox/staging" />
>     <location name="temp" path="/tmp" />
>     <location name="working" path="/apps/falcon/sandbox/working" />
>   </locations>
> </cluster>
> {code}
> 2. Submit a feed f1:
> {code:xml}
> <feed name="f1" description="f1" xmlns="uri:falcon:feed:0.1">
>   <frequency>days(1)</frequency>
>   <timezone>UTC</timezone>
>   <late-arrival cut-off="hours(48)" />
>   <clusters>
>     <cluster name="sandbox" type="source">
>       <validity start="2013-01-01T13:00Z" end="2099-12-31T13:00Z" />
>       <retention limit="months(9999)" action="delete" />
>     </cluster>
>   </clusters>
>   <locations>
>     <location type="data"
>       path="/f1/${YEAR}/${MONTH}/${DAY}" />
>   </locations>
>   <ACL owner="ambari-qa" group="users" permission="0775" />
>   <schema location="/none" provider="none" />
> </feed>
> {code}
> 3. Submit a process p1:
> {code:xml}
> <process name="p1" xmlns="uri:falcon:process:0.1">
>   <clusters>
>     <cluster name="sandbox">
>       <validity start="<TODAY>T08:30Z" end="2099-12-31T00:00Z"/>
>     </cluster>
>   </clusters>
>   <parallel>1</parallel>
>   <order>FIFO</order>
>   <frequency>days(1)</frequency>
>   <outputs>
>     <output name="output" feed="f1" instance="today(0,0)" />
>   </outputs>
>   <properties>
>   </properties>
>   <workflow name="p1-wf" engine="oozie" path="/apps/p1" />
>   <retry policy="periodic" delay="minutes(60)" attempts="24" />
> </process>
> {code}
> 4. Submit a feed f2:
> {code:xml}
> <feed name="f2" description="f2" xmlns="uri:falcon:feed:0.1">
>   <frequency>days(1)</frequency>
>   <timezone>UTC</timezone>
>   <late-arrival cut-off="hours(48)" />
>   <clusters>
>     <cluster name="sandbox" type="source">
>       <validity start="2013-01-01T13:00Z" end="2099-12-31T13:00Z" />
>       <retention limit="months(9999)" action="delete" />
>     </cluster>
>   </clusters>
>   <locations>
>     <location type="data"
>       path="/f2/${YEAR}/${MONTH}/${DAY}" />
>   </locations>
>   <ACL owner="ambari-qa" group="users" permission="0775" />
>   <schema location="/none" provider="none" />
> </feed>
> {code}
> 5. Submit a process p2:
> {code:xml}
> <process name="p2" xmlns="uri:falcon:process:0.1">
>   <clusters>
>     <cluster name="sandbox">
>       <validity start="<TODAY>T08:30Z" end="2099-12-31T00:00Z"/>
>     </cluster>
>   </clusters>
>   <parallel>1</parallel>
>   <order>FIFO</order>
>   <frequency>days(1)</frequency>
>   <inputs>
>     <input name="input" feed="f1" start="today(0,0)" end="today(0,0)" />
>   </inputs>
>   <outputs>
>     <output name="output" feed="f2" instance="today(0,0)" />
>   </outputs>
>   <workflow name="p2-wf" engine="oozie" path="/apps/p2" />
>   <retry policy="periodic" delay="minutes(60)" attempts="24" />
> </process>
> {code}
> 6. Note that:
> - Process p1 has no input feed (the data is fetched from some other location by p1).
> - Feed f1 is referenced in the output of p1, and also referenced in the input of p2.
> - All feeds are daily, and process input feed ranges and output feeds are daily, by way
of the 'today(0,0)' EL expression.
> 7. Finally, schedule all feeds and processes after 08:30Z on a given day, 'today'..
> *Expected:*
> 1. The first scheduled instance for p1 proceeds to COMPLETED, and produces a partition
in f1 for 'today'
> 2. The first scheduled instance for p2 proceeds to COMPLETED, and produces a partition
in f2 for 'today', since it looks for and finds a corresponding partition for 'today' in f1.
> *Actual:*
> 1. The first scheduled instance for p1 proceeds to COMPLETED, and produces a partition
in f1 for 'today'
> 2. However, the first scheduled instance for p2 is left in WAITING state, since it is
looking for a partition in f1 for 'yesterday', which does not exist (and will never exist).
> I am currently working around this unexpected behaviour by specifying the input feed
range start and end for p2 as 'today(24,0)' instead of 'today(0,0)'
> Please advise if this is indeed a) a bug or b) a mistake in the configuration.
> Many thanks,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message