falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ajay Yadava (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1644) Retention : Some feed instances are never deleted by retention jobs.
Date Tue, 08 Dec 2015 05:51:10 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046413#comment-15046413
] 

Ajay Yadava commented on FALCON-1644:
-------------------------------------

I am also of the opinion that this is a bug and should be fixed but without clear documentation
of the expected behaviour some users might be relying on this. So I will suggest a transitional
approach to this, something on lines of deprecating it using a config solution in one release
and then changing the behaviour completely in next release, probably in startup.properties.
To avoid surprises we can keep default consistent with old behaviour in first release and
announce that it will be changed in next release. This way users will opt in for the new behaviour
instead of opting out of it.



> Retention : Some feed instances are never deleted by retention jobs.
> --------------------------------------------------------------------
>
>                 Key: FALCON-1644
>                 URL: https://issues.apache.org/jira/browse/FALCON-1644
>             Project: Falcon
>          Issue Type: Bug
>          Components: retention
>    Affects Versions: 0.8
>            Reporter: Balu Vellanki
>            Assignee: Balu Vellanki
>             Fix For: 0.9
>
>         Attachments: FALCON-1644.patch
>
>
> ​Here is a sample feed xml.
> {code}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <feed name="rawEmailFeed" description="Raw customer email feed" xmlns="uri:falcon:feed:0.1">
>     <tags>externalSystem=USWestEmailServers</tags>
>     <groups>churnAnalysisDataPipeline</groups>
>     <frequency>hours(1)</frequency>
>     <timezone>UTC</timezone>
>     <late-arrival cut-off="hours(1)"/>
>     <clusters>
>         <cluster name="primaryCluster" type="source">
>             <validity start="2015-10-30T01:00Z" end="2015-10-30T10:00Z"/>
>             <retention limit="hours(10)" action="delete"/>
>         </cluster>
>     </clusters>
>     <locations>
>         <location type="data" path="/user/ambari-qa/falcon/demo/primary/input/enron/${YEAR}-${MONTH}-${DAY}-${HOUR}"/>
>         <location type="stats" path="/"/>
>         <location type="meta" path="/"/>
>     </locations>
>     <ACL owner="ambari-qa" group="users" permission="0x755"/>
>     <schema location="/none" provider="/none"/>
> </feed>
> {code}
> In the above example, the validity time is "the time interval when the feed is valid
on this cluster". After the validity time ends, falcon is not expected to perform any operations
on the feed. The retention job for this feed will be run from validity start time up to validity
end time, and will delete any feed instances older than 10 hours. Some instances of Feed will
never be deleted. In the above example, feed instances at between 2015-10-30T00:00Z and 2015-10-30T10:00Z
will never be deleted.
> Ideally, the retention coordinator job should run from "validity start time" up to "validity
end time + retention age limit" to ensure all instances are handled. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message