airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lahiru Gunathilake <glah...@gmail.com>
Subject Re: Airavata Orchestrator component
Date Fri, 06 Dec 2013 21:58:11 GMT
Hi All,

I have added a google doc[1] with anyone to comment.

[1]
https://docs.google.com/document/d/11fjql09tOiC0NLBaqdhZ9WAiMoBhkBJl7WC1N7DigcU/edit
Regards
Lahiru


On Thu, Dec 5, 2013 at 2:34 PM, Lahiru Gunathilake <glahiru@gmail.com>wrote:

> Hi All,
>
> We are thinking of implementing an Airavata Orchestrator component to
> replace WorkflowInterpreter to avoid gateway developers to dealing with
> workflows when they simply have one single independent jobs to run in their
> gateways. This component is mainly focusing on how to invoke GFAC and
> accept requests from the client API.
>
> I have following features in mind about this component.
>
> 1. It gives a web services or REST interface where we can implement a
> client to invoke it to submit jobs.
>
> 2. Accepts a job request and parse the input types and if input types are
> correct, this will create an Airavata experiment ID.
>
> 3. Orchestrtor then store the job information to registry against the
> generated experiment ID (All the other components identify the job using
> this experiment ID).
>
> 4. After that Orchestrator pull up all the descriptors related to this
> request and do some scheduling to decide where to run the job and submit
> the job to a GFAC node (Handling multiple GFAC nodes is going to be a
> future improvement in Orchestrator).
>
> If we are trying to do pull based job submission it might be a good idea
> to handle errors, if we store jobs to Registry and GFAC pull jobs and
> execute them Orchestrator component really doesn' t have to worry about the
> error handling.
>
> Because we can implement a logic to GFAC if a particular job is not
> updating its status fora g iven time it assume job is hanged or either GFAC
> node which handles that job is fauiled, so  GFAC pull that job (we
> definitely need a locking mechanism here, to avoid two instances are not
> going to  execute hanged job) and  start execute it. (If GFAC is handling a
> long running job still it has to update the job stutus frequently with the
> same status to make sure GFAC node is running).
>
> 5. GFAC creates its execution chain and store it back to registry with
> experiment ID, and GFAC updates its states using check pointing.
>
>
> 6. If we are not doing pull based submission,during a GFAC failure
> Orchestrator have to identify it and submit the active jobs from failure
> gfac node  to other nodes.  This might cause job duplication in case
> Orchestrator falls alarm about GFAC failure (so have to handle carefully).
>
> We have lot more to discus about the GFAC but I limit our discussion to
> Orchestrator component for now.
>
> WDYT about this design ?
>
> Lahiru
>
> --
> System Analyst Programmer
> PTI Lab
> Indiana University
>



-- 
System Analyst Programmer
PTI Lab
Indiana University

Mime
View raw message