falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sandeep samudrala (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1852) Optional Input for a process not truly optional
Date Fri, 11 Mar 2016 16:34:16 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191118#comment-15191118

sandeep samudrala commented on FALCON-1852:

Its very much right to do this, as there have been multiple occasions where users have reported
for their job failures for optional inputs being not evaluated(data not being present).

The only issue I am seeing is in case of the data not being available completely for a path(feeds
with availability flag may not be complete with mere existence of the directory), in which
case there might a half baked data being consumed by the process. The above case can be written
off saying for optional meaning to what ever the data being available. 

I am all thumps up for the above approach too.

> Optional Input for a process not truly optional
> -----------------------------------------------
>                 Key: FALCON-1852
>                 URL: https://issues.apache.org/jira/browse/FALCON-1852
>             Project: Falcon
>          Issue Type: Bug
>            Reporter: Pallavi Rao
>            Assignee: Pallavi Rao
> Currently, when a feed input is marked as optional, we do not add it to the coordinator
definition's datasets. This means we do not wait for all instances (for a given data window)
to arrive. Instead, we just resolve the paths for a data window and pass it as a parameter.
> For example:
> {noformat}
> <inputs>
>         <!-- In the workflow, the input paths will be available in a variable 'inpaths'
>         <input name="inpaths" feed="in" start="now(0,-5)" end="now(0,-1)"/>
>         <input name="in2paths" feed="in2" start="now(0,-5)" end="now(0,-1)" optional="true"/>
>     </inputs>
> {noformat}
> For a process instance 2013-01-01T00:00Z, the optional input, in2paths, will be resolved
as below:
> {noformat}
>  <property>
>     <name>in2paths</name>
>     <value>hdfs://localhost:9000/data/in2/2013/11/15/00/04,hdfs://localhost:9000/data/in2/2013/11/15/00/03,hdfs://localhost:9000/data/in2/2013/11/15/00/02,hdfs://localhost:9000/data/in2/2013/11/15/00/01,hdfs://localhost:9000/data/in2/2013/11/15/00/00</value>
>   </property>
> {noformat}
> If one of the instance of in2paths (example, hdfs://localhost:9000/data/in2/2013/11/15/00/04)
is missing, the workflow will fail anyway.
> Hence, input, in2paths is not truly optional. Only that the triggering of instance is
not gated on it.

This message was sent by Atlassian JIRA

View raw message