airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amila Jayasekara <thejaka.am...@gmail.com>
Subject Re: Persisting GFac job data
Date Tue, 21 May 2013 16:29:46 GMT
Hi Saminda,

Great suggestion. Also +1 for Dhanushka's proposal to have
serialize/de-serilized data.
Few suggestions,
1. In addition to successful/error statuses we need other status for nodes
and workflows.
E . g :-
   node - started, submitted, in-progress, failed, successful etc ...
2. This data will be useful in implementing FT and Load Balancing in each
component. Sometime back we had discussions to make GFac stateless. So who
is going to populate this data structure and persist it ?

Thanks
Amila


On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne <samindaw@gmail.com>wrote:

> Thats is an excellent idea. We can have the job data field to be the
> designated GFac job serialized data. The whatever GFacProvider should
> adhere to it.
>
> I'm still inclined to have the rest of the fields to ease of querying for
> the required data. For example if we wanted all attempts on executing for a
> particular node of a workflow or if we wanted to know which application
> descriptions are faster in execution or more reliable etc. we can let the
> query language deal with it. wdyt?
>
>
> On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura <
> danushka.menikkumbura@gmail.com> wrote:
>
> > Saminda,
> >
> > I think the data container does not need to have a generic format. We can
> > have a base class that facilitate object serialization/deserialization
> and
> > let specific meta data structure implement them as required. We get the
> > Registry API to serialize objects and save them in a meta data table
> (with
> > just two columns?) and to deserialize as they are loaded off the
> registry.
> >
> > Danushka
> >
> >
> > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne <samindaw@gmail.com
> > >wrote:
> >
> > > It has being apparent more and more that saving the data related to
> > > executing a jobs from the GFac can be useful for many reasons such as,
> > >
> > > debugging
> > > retrying
> > > to make smart decisions on reliability/cost etc.
> > > statistical analysis
> > >
> > > Thus we thought of saving the data related to GFac jobs in the registry
> > in
> > > order to facilitate feature such as above in the future.
> > >
> > > However a GFac job is potentially any sort of computing resource access
> > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up with a
> generalized
> > > data structure that can hold the data of any type of resource.
> Following
> > > are the suggested data to save for a single GFac job execution,
> > >
> > > *experiment id, workflow instance id, node id* - pinpoint the node
> > > execution
> > > *service, host, application description ids *- pinpoint the descriptors
> > > responsible
> > > *local job id* - the unique job id retrieved/generated per execution
> > > [PRIMARY KEY]
> > > *job data* - data related executing the job (eg: the rsl in GRAM)
> > > *submitted, completed time*
> > > *completed status* - whether the job was successfull or ran in to
> errors
> > > etc.
> > > *metadata* - custom field to add anything user wants
> > >
> > > Your feedback is most welcome. The API related changes will also be
> > > discussed once we have a proper data structure. We are hoping to
> > implement
> > > this within next few days.
> > >
> > > Thanks,
> > > Saminda
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message