falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ajay Yadava (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1149) The 'today' EL date expression is resolving to yesterday's date, for process instance input feed ranges
Date Fri, 26 Jun 2015 07:10:05 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602499#comment-14602499
] 

Ajay Yadava commented on FALCON-1149:
-------------------------------------

Hi [~alza],

Sorry for delayed response on this. I have tried reproducing your scenario again with same
definitions but I am not able to reproduce the scenario. According to the data that you have
given first instance generated by p1 will also be the instance that will be consumed by p2.
This is also what I am getting when I try to reproduce the scenario. First instance of p2
shouldn't be stuck. 

Can you please schedule the processes and feeds again and provide the following information?
1) exact start date and time for both p1 and p2 and f1, f2. You have <TODAY> specified
in both. (using entity definition command in cli - falcon entity -type process -name p1 -definition)
2) exact instance produced by p1 and exact (full path) being searched for by p2
3) How exactly is the feed path getting created in p1 (e.g. can you paste the line containing
STORE command in case of pig)?


> The 'today' EL date expression is resolving to yesterday's date, for process instance
input feed ranges
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FALCON-1149
>                 URL: https://issues.apache.org/jira/browse/FALCON-1149
>             Project: Falcon
>          Issue Type: Bug
>    Affects Versions: 0.5, 0.6
>         Environment: HDP 2.1 sandbox, HDP 2.2 sandbox; server in UTC
>            Reporter: Alex C
>            Assignee: Ajay Yadava
>             Fix For: 0.6.1
>
>
> *Steps to reproduce* 
> 1. Submit a cluster named 'sandbox':
> {code:xml}
> <cluster colo="local" description="Sandbox Cluster" name="sandbox" xmlns="uri:falcon:cluster:0.1">
>   <interfaces>
>     <interface type="readonly" endpoint="hftp://sandbox.hortonworks.com:50070" version="2.2.0"
/>
>     <interface type="write" endpoint="hdfs://sandbox.hortonworks.com:8020" version="2.2.0"
/>
>     <interface type="execute" endpoint="sandbox.hortonworks.com:8050" version="2.2.0"
/>
>     <interface type="workflow" endpoint="http://sandbox.hortonworks.com:11000/oozie/"
version="4.0.0" />
>     <interface type="messaging" endpoint="tcp://sandbox.hortonworks.com:61616?daemon=true"
version="5.1.6" />
>   </interfaces>
>   <locations>
>     <location name="staging" path="/apps/falcon/sandbox/staging" />
>     <location name="temp" path="/tmp" />
>     <location name="working" path="/apps/falcon/sandbox/working" />
>   </locations>
> </cluster>
> {code}
> 2. Submit a feed f1:
> {code:xml}
> <feed name="f1" description="f1" xmlns="uri:falcon:feed:0.1">
>   <frequency>days(1)</frequency>
>   <timezone>UTC</timezone>
>   <late-arrival cut-off="hours(48)" />
>   <clusters>
>     <cluster name="sandbox" type="source">
>       <validity start="2013-01-01T13:00Z" end="2099-12-31T13:00Z" />
>       <retention limit="months(9999)" action="delete" />
>     </cluster>
>   </clusters>
>   <locations>
>     <location type="data"
>       path="/f1/${YEAR}/${MONTH}/${DAY}" />
>   </locations>
>   <ACL owner="ambari-qa" group="users" permission="0775" />
>   <schema location="/none" provider="none" />
> </feed>
> {code}
> 3. Submit a process p1:
> {code:xml}
> <process name="p1" xmlns="uri:falcon:process:0.1">
>   <clusters>
>     <cluster name="sandbox">
>       <validity start="<TODAY>T08:30Z" end="2099-12-31T00:00Z"/>
>     </cluster>
>   </clusters>
>   <parallel>1</parallel>
>   <order>FIFO</order>
>   <frequency>days(1)</frequency>
>   <outputs>
>     <output name="output" feed="f1" instance="today(0,0)" />
>   </outputs>
>   <properties>
>   </properties>
>   <workflow name="p1-wf" engine="oozie" path="/apps/p1" />
>   <retry policy="periodic" delay="minutes(60)" attempts="24" />
> </process>
> {code}
> 4. Submit a feed f2:
> {code:xml}
> <feed name="f2" description="f2" xmlns="uri:falcon:feed:0.1">
>   <frequency>days(1)</frequency>
>   <timezone>UTC</timezone>
>   <late-arrival cut-off="hours(48)" />
>   <clusters>
>     <cluster name="sandbox" type="source">
>       <validity start="2013-01-01T13:00Z" end="2099-12-31T13:00Z" />
>       <retention limit="months(9999)" action="delete" />
>     </cluster>
>   </clusters>
>   <locations>
>     <location type="data"
>       path="/f2/${YEAR}/${MONTH}/${DAY}" />
>   </locations>
>   <ACL owner="ambari-qa" group="users" permission="0775" />
>   <schema location="/none" provider="none" />
> </feed>
> {code}
> 5. Submit a process p2:
> {code:xml}
> <process name="p2" xmlns="uri:falcon:process:0.1">
>   <clusters>
>     <cluster name="sandbox">
>       <validity start="<TODAY>T08:30Z" end="2099-12-31T00:00Z"/>
>     </cluster>
>   </clusters>
>   <parallel>1</parallel>
>   <order>FIFO</order>
>   <frequency>days(1)</frequency>
>   <inputs>
>     <input name="input" feed="f1" start="today(0,0)" end="today(0,0)" />
>   </inputs>
>   <outputs>
>     <output name="output" feed="f2" instance="today(0,0)" />
>   </outputs>
>   <workflow name="p2-wf" engine="oozie" path="/apps/p2" />
>   <retry policy="periodic" delay="minutes(60)" attempts="24" />
> </process>
> {code}
> 6. Note that:
> - Process p1 has no input feed (the data is fetched from some other location by p1).
> - Feed f1 is referenced in the output of p1, and also referenced in the input of p2.
> - All feeds are daily, and process input feed ranges and output feeds are daily, by way
of the 'today(0,0)' EL expression.
> 7. Finally, schedule all feeds and processes after 08:30Z on a given day, 'today'..
> *Expected:*
> 1. The first scheduled instance for p1 proceeds to COMPLETED, and produces a partition
in f1 for 'today'
> 2. The first scheduled instance for p2 proceeds to COMPLETED, and produces a partition
in f2 for 'today', since it looks for and finds a corresponding partition for 'today' in f1.
> *Actual:*
> 1. The first scheduled instance for p1 proceeds to COMPLETED, and produces a partition
in f1 for 'today'
> 2. However, the first scheduled instance for p2 is left in WAITING state, since it is
looking for a partition in f1 for 'yesterday', which does not exist (and will never exist).
> I am currently working around this unexpected behaviour by specifying the input feed
range start and end for p2 as 'today(24,0)' instead of 'today(0,0)'
> Please advise if this is indeed a) a bug or b) a mistake in the configuration.
> Many thanks,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message