falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pallavi Rao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1602) Recoverability of Falcon Processes when ActiveMQ down for sometime
Date Mon, 25 Jan 2016 04:56:39 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114754#comment-15114754

Pallavi Rao commented on FALCON-1602:

My initial thought was that the JobCompletionService should use a combination of notification
and polling. If there are no notifications for "some time", poll Oozie to see if the job completed.
"Some time" can be a function of duration of previous instance runs or SLA when it exists.

> Recoverability of Falcon Processes when ActiveMQ down for sometime 
> -------------------------------------------------------------------
>                 Key: FALCON-1602
>                 URL: https://issues.apache.org/jira/browse/FALCON-1602
>             Project: Falcon
>          Issue Type: Task
>            Reporter: pavan kumar kolamuri
> With Falcon Native Scheduler activemq is used for job completion notifications from oozie.
When activemq is down for sometime, and oozie fails to send notifications of completion of
workflows of process instances even after retries. Then those instances won't mark as completed
in Falcon state store. Then for that processes new instances won't be launched assuming old
one's still running. There should be some recoverability in these cases.

This message was sent by Atlassian JIRA

View raw message