airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raminder Singh <>
Subject Re: Stateful vs. fire-and-forget GFac providers
Date Thu, 24 Oct 2013 16:27:17 GMT
Thanks Marlon for starting the discussion.  I think this change can solve multiple issues gateways

1. Jobs sometime get zombie and loose its states. Having a monitoring component outside the
GFAC can allow us to provide interface to update the state if the client think job is already
finished. Then the jobs will not be a black box for the clients. 
2. This can lead to providing better job management interface to gateways as the job state
is saved outside the GFAC. We can make recovery decisions better based on human input also.

I think we will be able to solve workflow problem also along this way by introducing Job Orchestrator
or some state machine and workflow interpreter can relay on that for workflow orchestration.

+1 to adding this and bringing some design discussion to the list. 


On Oct 24, 2013, at 12:00 PM, Lahiru Gunathilake <> wrote:

> Hi Marlon,
> In Airavata since we are using GFAC as an embedded mode with Workflow Interpreter it
not really a fire and forget even if we implement this in GFAC core.
> But it will not be bad since in WorkflowInterpreter we are handing each node in a separate
thread. But if we are going to use gfac as a separate job submitting component this will definitely
make sense.
> So I am +1 for this change.
> Regards
> Lahiru
> On Thu, Oct 24, 2013 at 11:48 AM, Marlon Pierce <> wrote:
> The current GFAC providers all execute tasks in "blocking" mode: the
> provider stays active until the job terminates. This introduces some
> tradeoffs. On the one hand, determining the job state is very
> provider-specific. Doing it all in the provider makes things relatively
> simple to implement.
> On the other hand, this makes Airavata's state complicated.  This
> increases the difficulty of handling fault recovery and "elastic"
> scenarios, where we may need to restart failed servers, pass work from
> one running instance to another, and so forth.
> If we wanted to make the provider stateless and move monitoring to a
> different place, this would take some thoughtful design--I don't have an
> idea of the scope--so even if we all agreed it is a good idea, we have
> to overcome an energy barrier of a current system that is good enough
> for what we need to do.
> What are your thoughts?  We had a related discussion about this for a
> specific use case back in July [1].
> Marlon
> [1]
> -- 
> System Analyst Programmer
> PTI Lab
> Indiana University

View raw message