falcon-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sowmya...@apache.org
Subject falcon git commit: FALCON-1767 Improve Falcon retention policy documentation
Date Tue, 03 May 2016 21:21:37 GMT
Repository: falcon
Updated Branches:
  refs/heads/master 2d51db7a0 -> fc34d42cb

FALCON-1767 Improve Falcon retention policy documentation

Author: Sowmya Ramesh <sramesh@hortonworks.com>

Reviewers: "Balu Vellanki <balu@apache.org>"

Closes #121 from sowmyaramesh/FALCON-1767

Project: http://git-wip-us.apache.org/repos/asf/falcon/repo
Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/fc34d42c
Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/fc34d42c
Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/fc34d42c

Branch: refs/heads/master
Commit: fc34d42cbe1a325d686d65fdf7d863d254d7e4d1
Parents: 2d51db7
Author: ["Sowmya Ramesh <sowmya_kr@apache.org>
Authored: Tue May 3 14:21:31 2016 -0700
Committer: Sowmya Ramesh <sramesh@hortonworks.com>
Committed: Tue May 3 14:21:31 2016 -0700

 docs/src/site/twiki/FalconDocumentation.twiki | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/src/site/twiki/FalconDocumentation.twiki b/docs/src/site/twiki/FalconDocumentation.twiki
index 122435a..2d67070 100644
--- a/docs/src/site/twiki/FalconDocumentation.twiki
+++ b/docs/src/site/twiki/FalconDocumentation.twiki
@@ -266,6 +266,12 @@ to false in runtime.properties.
 With the integration of Hive, Falcon also provides retention for tables in Hive catalog.
+When a feed is scheduled Falcon kicks off the retention policy immediately. When job runs,
it deletes everything that's eligible for eviction - eligibility criteria is the date pattern
on the partition and NOT creation date.
+For e.g. if the retention limit is 90 days then retention job consistently deletes files
older than 90 days.
+For retention, Falcon expects data to be in dated partitions. When the retention job is kicked
off, it discovers data that needs to be evicted based on retention policy. It gets the location
from the feed and uses pattern matching
+to find the pattern to get the list of data for the feed, then gets the date from the data
path. If the data path date is beyond the retention limit it's deleted. As this uses pattern
matching it is not time consuming and hence doesn't introduce performance overhead.
 ---+++ Example:
 If retention period is 10 hours, and the policy kicks in at time 't', the data retained by
system is essentially the
 one after or equal to t-10h . Any data before t-10h is removed from the system.

View raw message