falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pallavi Rao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1676) When a paritition is specified in input feed, Falcon should only wait for data availability in a partition
Date Mon, 21 Dec 2015 05:29:46 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066062#comment-15066062
] 

Pallavi Rao commented on FALCON-1676:
-------------------------------------

The reason this happens is because of the way the coordinator definition is created by Falcon.
{code}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<coordinator-app xmlns="uri:oozie:coordinator:0.3" name="FALCON_PROCESS_DEFAULT_DP-BaseSummaryProcess"
frequency="${coord:minutes(30)}" start="2014-12-09T10:00Z" end="2099-01-01T00:00Z" timezone="UTC">
..
    <datasets>
        <dataset name="ConversionEnhance" frequency="${coord:minutes(30)}" initial-instance="2013-02-26T08:00Z"
timezone="UTC">
            <uri-template>hdfs://emerald/data/fetl/conversionenhance/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}</uri-template>
            <done-flag>_SUCCESS</done-flag>
        </dataset>
...
                <property>
                    <name>ConversionEnhance</name>
                    <value>${dataIn('ConversionEnhance', '*/{MATCH}')}</value>
                </property>
{code}

The base directory (without partition) is specified as the dataset on which co-ordinator waits
for data to become available. The path however is resolved to append the partition.




> When a paritition is specified in input feed, Falcon should only wait for data availability
in a partition
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: FALCON-1676
>                 URL: https://issues.apache.org/jira/browse/FALCON-1676
>             Project: Falcon
>          Issue Type: Bug
>            Reporter: Pallavi Rao
>
> When a process uses a feed with partition as its input, Falcon waits for data to be available
in all partitions (parent dir), rather than just wait for data availability in that particular
partition.
> Example process input:
> {code}
> <inputs>
>         <input name="ConversionEnhance" feed="FETL-ConversionEnhance" start="now(0,-30)"
end="now(0,-30)" partition="*/{MATCH}"/>
> {code}
> If the consumer doesn't want to wait for all the data to available and is bothered about
data only in that partition, currently, user will be forced to create a feed per partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message