falcon-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ajayyad...@apache.org
Subject falcon git commit: FALCON-1204 Expose default configs for feed late data handling in runtime.properties. Contributed by Balu Vellanki.
Date Thu, 16 Jul 2015 06:37:58 GMT
Repository: falcon
Updated Branches:
  refs/heads/master 9066eac27 -> 09841bbea


FALCON-1204 Expose default configs for feed late data handling in runtime.properties. Contributed
by Balu Vellanki.


Project: http://git-wip-us.apache.org/repos/asf/falcon/repo
Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/09841bbe
Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/09841bbe
Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/09841bbe

Branch: refs/heads/master
Commit: 09841bbeab843df681f70ca21eb1c856507149c2
Parents: 9066eac
Author: Ajay Yadava <ajaynsit@gmail.com>
Authored: Thu Jul 16 12:06:48 2015 +0530
Committer: Ajay Yadava <ajaynsit@gmail.com>
Committed: Thu Jul 16 12:06:48 2015 +0530

----------------------------------------------------------------------
 CHANGES.txt                                   |  2 ++
 common/src/main/resources/runtime.properties  |  7 ++++++-
 docs/src/site/twiki/FalconDocumentation.twiki | 12 +++++++++++-
 src/conf/runtime.properties                   | 11 +++++++++--
 4 files changed, 28 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/falcon/blob/09841bbe/CHANGES.txt
----------------------------------------------------------------------
diff --git a/CHANGES.txt b/CHANGES.txt
index 8b96e78..63298f0 100755
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -9,6 +9,8 @@ Trunk (Unreleased)
     FALCON-796 Enable users to triage data processing issues through falcon (Ajay Yadava)
     
   IMPROVEMENTS
+    FALCON-1204 Expose default configs for feed late data handling in runtime.properties(Balu
Vellanki via Ajay Yadava)
+
     FALCON-1170 Falcon Native Scheduler - Refactor existing workflow/coord/bundle builder(Pallavi
Rao via Ajay Yadava)
     
     FALCON-1031 Make post processing notifications to user topics optional (Pallavi Rao via
Ajay Yadava)

http://git-wip-us.apache.org/repos/asf/falcon/blob/09841bbe/common/src/main/resources/runtime.properties
----------------------------------------------------------------------
diff --git a/common/src/main/resources/runtime.properties b/common/src/main/resources/runtime.properties
index 8d465e8..3b32463 100644
--- a/common/src/main/resources/runtime.properties
+++ b/common/src/main/resources/runtime.properties
@@ -23,4 +23,9 @@
 
 *.falcon.replication.workflow.maxmaps=5
 *.falcon.replication.workflow.mapbandwidth=100
-webservices.default.max.results.per.page=100
+*.webservices.default.max.results.per.page=100
+
+# Default configs to handle replication for late arriving feeds.
+*.feed.late.allowed=true
+*.feed.late.frequency=hours(3)
+*.feed.late.policy=exp-backoff
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/falcon/blob/09841bbe/docs/src/site/twiki/FalconDocumentation.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/FalconDocumentation.twiki b/docs/src/site/twiki/FalconDocumentation.twiki
index c374966..9804a57 100644
--- a/docs/src/site/twiki/FalconDocumentation.twiki
+++ b/docs/src/site/twiki/FalconDocumentation.twiki
@@ -561,7 +561,7 @@ simple and basic. The falcon system looks at all dependent input feeds
for a pro
 cut-off period. Then it uses a scheduled messaging framework, like the one available in Apache
ActiveMQ or Java's !DelayQueue to schedule a message with a cut-off period, then after a cut-off
period the message is dequeued and Falcon checks for changes in the feed data which is recorded
in HDFS in latedata file by falcons "record-size" action, if it detects any changes then the
workflow will be rerun with the new set of feed data.
 
 *Example:*
-The late rerun policy can be configured in the process definition.
+For a process entity, the late rerun policy can be configured in the process definition.
 Falcon supports 3 policies, periodic, exp-backoff and final.
 Delay specifies, how often the feed data should be checked for changes, also one needs to

 explicitly set the feed names in late-input which needs to be checked for late data.
@@ -575,6 +575,16 @@ explicitly set the feed names in late-input which needs to be checked
for late d
 *NOTE:* Feeds configured with table storage does not support late input data handling at
this point. This will be
 made available in the near future.
 
+For a feed entity replication job, the default late data handling policy can be configured
in the runtime.properties file.
+Since these properties are runtime.properties, they will take effect for all replication
jobs completed subsequent to the change.
+<verbatim>
+  # Default configs to handle replication for late arriving feeds.
+  *.feed.late.allowed=true
+  *.feed.late.frequency=hours(3)
+  *.feed.late.policy=exp-backoff
+</verbatim>
+
+
 ---++ Idempotency
 All the operations in Falcon are Idempotent. That is if you make same request to the falcon
server / prism again you will get a SUCCESSFUL return if it was SUCCESSFUL in the first attempt.
For example, you submit a new process / feed and get SUCCESSFUL message return. Now if you
run the same command / api request on same entity you will again get a SUCCESSFUL message.
Same is true for other operations like schedule, kill, suspend and resume.
 Idempotency also by takes care of the condition when request is sent through prism and fails
on one or more servers. For example prism is configured to send request to 3 servers. First
user sends a request to SUBMIT a process on all 3 of them, and receives a response SUCCESSFUL
from all of them. Then due to some issue one of the servers goes down, and user send a request
to schedule the submitted process. This time he will receive a response with PARTIAL status
and a FAILURE message from the server that has gone down. If the users check he will find
the process would have been started and running on the 2 SUCCESSFUL servers. Now the issue
with server is figured out and it is brought up. Sending the SCHEDULE request again through
prism will result in a SUCCESSFUL response from prism as well as other three servers, but
this time PROCESS will be SCHEDULED only on the server which had failed earlier and other
two will keep running as before. 

http://git-wip-us.apache.org/repos/asf/falcon/blob/09841bbe/src/conf/runtime.properties
----------------------------------------------------------------------
diff --git a/src/conf/runtime.properties b/src/conf/runtime.properties
index a40d369..58dee3d 100644
--- a/src/conf/runtime.properties
+++ b/src/conf/runtime.properties
@@ -26,8 +26,15 @@
 #prism should have the following properties
 prism.all.colos=local
 prism.falcon.local.endpoint=https://localhost:15443
-#falcon server should have the following properties
+
+# falcon server should have the following properties
 falcon.current.colo=local
 webservices.default.max.results.per.page=100
+
 # retry count - to fetch the status from the workflow engine
-workflow.status.retry.count=30
\ No newline at end of file
+workflow.status.retry.count=30
+
+# Default configs to handle replication for late arriving feeds.
+feed.late.allowed=true
+feed.late.frequency=hours(3)
+feed.late.policy=exp-backoff
\ No newline at end of file


Mime
View raw message