airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lahiru Gunathilake <glah...@gmail.com>
Subject GFac App Catalog integration
Date Mon, 29 Sep 2014 14:15:58 GMT
Hi All,

We have replaced XML descriptions we used to describe applications/hosts
 with new design(App Catalog)[1], now now GFac has to change to adapt new
App Catalog. Before jumping in to the integration I think its a good time
to review the GFac architecture and modify if necessary.

According to current architecture of GFAC we have plugins which can be
executed in a chain and this chained configuration is configured in an
XML[2] and based on the computing resource type(based on the old XML
descriptions)[3]. Basically we can configure an execution chain for GSISSH
type resource and another chain for an SSHType resource and another one for
some other type (like EC2). Currently we differentiate hosts based on their
authentication mechanism.This architecture leads to following limitations.


   1. There could be scenarios for the same host type we need a different
   execution pattern, according to this model we cannot have two execution
   chains for the same host type.
   2. There could be cases where we want to run particular handler only for
   a given machine but in the same host type(Ex: for stampede run Handler1 at
   the end but not for any other machine).
   3. Differentiating hosts based on authentication doesn't looks right
   because we have few machines authenticate in different mechanism but
   everything else is same. But this problem has been solved in App Catalog
   design and it has available authentication mechanism for a given compute
   resource.
   4. Currently execution chain is picked initially based on host and try
   to execute but we do not have a fallback execution chain or any fault
   tolerance in experiment level. Do we have to do a fault tolerance in this
   level or just make the experiment failed  and make Orchestrator to send
   another job request to GFac with a different computing resource or
   different authentication mechanism(if authentication failed) in a failure
   scenario?Note: We only have fault tolerance implemented as if a particular
   GFAC instance start not responding which is very rare unless we have a very
   heavy load. In this case another gFAC instance can pick the execution chain
   from the check pointed location and start executing the rest of the chain,
   because execution chain picking logic is statically configured in an xml
   and plugins itself can implement a recover method which can be invoked by
   the Gfac core during a recovery process.
   5. Currently there is a way to configure a chain based on the gateway
   name but this is a simple configuration which means one gateway name can
   have only one configuration and it will be the same for any host(We can
   improve that to embed the configuration for each host in to an outer config
   which provide the gateway name, so for a given gateway name we can have
   given set of execution chains). I think GFac execution should be
   customizable for each gateway without interfering other gateways.
   6. Currently each plugin implementations are independent(EC2,GSISSH,SSH
   etc) and they all have a dependency to gfac-core, if there are usecases
   where mix of these plugins has to run we can configure these in a single
   execution chain and as long as all the plugin artifacts are in the
   classpath of GFAC things should work out of the box. But in such a scenario
   how do we configure a chain(it will be hard to configure based on a
   particular host).

My suggestion is to introduce more advanced XML configuration language with
defined precedence to pick a chains. Of course there will be n! ways to
order n number of plugins but in practice this will be very low number.
Since we have a precedence for selecting hosts(Ex: gateways name,
authentication type, host address,gateway user name or some other property
like when cpuCount > 10 in host stampede). We can come up with a proper
precedence order based on how specific the configuration is and we should
be able to group an execution between multiple gateways etc (We can discuss
about a nice way to configure the xml which covers most of the limitations
explained above).

To address the problem #4, if we decide GFAC has to act smart without
making an execution fail we can come up with some fault chain with a
precedence but since App Catalog is already have a way to define the
precedence, orchestrator can fall back and submit another job to Gfac so
GFac can act in a stateless way to handler a particular request(everything
from app-catalog is finalized when the request comes to Gfac, Gfac just
have to find the right set of handlers to use for this request).

And we can change the static nature of the configuration by storing this
configuration in to registry so when GFAc is up and running(Once we do a
proper Admin UI), this configuration can be modified by the admin. If a new
requirement comes for a particular gateway we can implement a plugin and
configure that plugin.

If you have any Ideas that would be great.

Regards
Lahiru



[1]
https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Application+Catalog
[2]
https://github.com/apache/airavata/blob/master/modules/configuration/server/src/main/resources/gfac-config.xml
[3]
https://github.com/apache/airavata/tree/master/modules/commons/gfac-schema/src/main/resources/schemas




-- 
Research Assistant
Science Gateways Group
Indiana University

Mime
View raw message