airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lahiru Gunathilake <glah...@gmail.com>
Subject Airavata Orchestrator component
Date Thu, 05 Dec 2013 19:34:55 GMT
Hi All,

We are thinking of implementing an Airavata Orchestrator component to
replace WorkflowInterpreter to avoid gateway developers to dealing with
workflows when they simply have one single independent jobs to run in their
gateways. This component is mainly focusing on how to invoke GFAC and
accept requests from the client API.

I have following features in mind about this component.

1. It gives a web services or REST interface where we can implement a
client to invoke it to submit jobs.

2. Accepts a job request and parse the input types and if input types are
correct, this will create an Airavata experiment ID.

3. Orchestrtor then store the job information to registry against the
generated experiment ID (All the other components identify the job using
this experiment ID).

4. After that Orchestrator pull up all the descriptors related to this
request and do some scheduling to decide where to run the job and submit
the job to a GFAC node (Handling multiple GFAC nodes is going to be a
future improvement in Orchestrator).

If we are trying to do pull based job submission it might be a good idea to
handle errors, if we store jobs to Registry and GFAC pull jobs and execute
them Orchestrator component really doesn' t have to worry about the error
handling.

Because we can implement a logic to GFAC if a particular job is not
updating its status fora g iven time it assume job is hanged or either GFAC
node which handles that job is fauiled, so  GFAC pull that job (we
definitely need a locking mechanism here, to avoid two instances are not
going to  execute hanged job) and  start execute it. (If GFAC is handling a
long running job still it has to update the job stutus frequently with the
same status to make sure GFAC node is running).

5. GFAC creates its execution chain and store it back to registry with
experiment ID, and GFAC updates its states using check pointing.


6. If we are not doing pull based submission,during a GFAC failure
Orchestrator have to identify it and submit the active jobs from failure
gfac node  to other nodes.  This might cause job duplication in case
Orchestrator falls alarm about GFAC failure (so have to handle carefully).

We have lot more to discus about the GFAC but I limit our discussion to
Orchestrator component for now.

WDYT about this design ?

Lahiru

-- 
System Analyst Programmer
PTI Lab
Indiana University

Mime
View raw message