airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Supun Kamburugamuva <>
Subject Re: Stateful vs. fire-and-forget GFac providers
Date Thu, 24 Oct 2013 18:43:09 GMT
My thoughts are along the lines of making GFac stateless for better
replication and recovery.

My be what Airavata need is an Execution Plan concept in GFac. An execution
plan consists of execution blocks and their execution order and plan should
be serializable for replication. When a job is submitted to GFac, it should
be able to create a full execution plan from the information provided. Then
this plan can be replicated to other nodes and coordinated execution of the
blocks can be done.

GFac can execute each of the execution blocks in this plan. The blocks
should be stateless. The execution blocks corresponds to tasks like file
transfer, invoking the job etc. The output of a block can be made available
to other blocks down the execution.

Because the state of the execution plan is replicated any node should be
able to take over the execution and continue.


On Thu, Oct 24, 2013 at 12:27 PM, Raminder Singh

> Thanks Marlon for starting the discussion.  I think this change can solve
> multiple issues gateways face.
> 1. Jobs sometime get zombie and loose its states. Having a monitoring
> component outside the GFAC can allow us to provide interface to update the
> state if the client think job is already finished. Then the jobs will not
> be a black box for the clients.
> 2. This can lead to providing better job management interface to gateways
> as the job state is saved outside the GFAC. We can make recovery decisions
> better based on human input also.
> I think we will be able to solve workflow problem also along this way by
> introducing Job Orchestrator or some state machine and workflow interpreter
> can relay on that for workflow orchestration.
> +1 to adding this and bringing some design discussion to the list.
> Thanks
> Raminder
> On Oct 24, 2013, at 12:00 PM, Lahiru Gunathilake <>
> wrote:
> Hi Marlon,
> In Airavata since we are using GFAC as an embedded mode with Workflow
> Interpreter it not really a fire and forget even if we implement this in
> GFAC core.
> But it will not be bad since in WorkflowInterpreter we are handing each
> node in a separate thread. But if we are going to use gfac as a separate
> job submitting component this will definitely make sense.
> So I am +1 for this change.
> Regards
> Lahiru
> On Thu, Oct 24, 2013 at 11:48 AM, Marlon Pierce <> wrote:
>> The current GFAC providers all execute tasks in "blocking" mode: the
>> provider stays active until the job terminates. This introduces some
>> tradeoffs. On the one hand, determining the job state is very
>> provider-specific. Doing it all in the provider makes things relatively
>> simple to implement.
>> On the other hand, this makes Airavata's state complicated.  This
>> increases the difficulty of handling fault recovery and "elastic"
>> scenarios, where we may need to restart failed servers, pass work from
>> one running instance to another, and so forth.
>> If we wanted to make the provider stateless and move monitoring to a
>> different place, this would take some thoughtful design--I don't have an
>> idea of the scope--so even if we all agreed it is a good idea, we have
>> to overcome an energy barrier of a current system that is good enough
>> for what we need to do.
>> What are your thoughts?  We had a related discussion about this for a
>> specific use case back in July [1].
>> Marlon
>> [1]
> --
> System Analyst Programmer
> PTI Lab
> Indiana University

Supun Kamburugamuva
Member, Apache Software Foundation;
E-mail:;  Mobile: +1 812 369 6762

View raw message