airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Weachock <>
Subject Re: [DISCUSS] Data models for 0.16 and beyond
Date Wed, 01 Jul 2015 03:29:46 GMT
Excellent! Do you have any pointers to where I can look to start reading
the code and begin adding the necessary features to the registry?


On Tue, Jun 30, 2015 at 5:05 AM, Supun Nakandala <>

> Hi John,
> Even though in the current thrift models has only one status entry, in the
> database we maintain all the state transitions (i.e all the status
> entries). But when retrieving an experiment, process, task, or job only the
> latest status is returned based on the creation time stamp. So at the
> registry level we can support your requirement. What is required is the
> required thrift models to transfer those data via the APIs/CPIs.
> On Tue, Jun 30, 2015 at 1:28 PM, John Weachock <>
> wrote:
>> Hi Supun,
>> Sorry for sending this message so late!
>> Last week I discussed a change to the data models with Suresh regarding
>> task / job / experiment / etc statuses. Currently, each item has a single
>> status ID that points to a status that's updated every change. However, if
>> each item contained a *list* of status IDs, and each status change
>> created a *new* status entry, we can record data about experiment run
>> times, which could be used in future versions to assist in benchmark and
>> runtime prediction efforts. Additionally, users could be provided the
>> information about the progression of their experiment.
>> Thanks,
>> John
>> On Sun, Jun 14, 2015 at 1:56 PM, Supun Nakandala <
>>> wrote:
>>> Hi All,
>>> I came up with the initial version of the schema for the new experiment
>>> catalog. It is very much similar to the existing model and have few changes
>>> 1. In Experiments I have used one text field for email addresses with
>>> the intention of storing comma separated email list. The idea was to avoid
>>> another DB table join. And also in Errors tables I have used a single text
>>> field for storing parent error ids with the same intention.
>>> 2. I have used separate tables for ExperimentErrors, ProcessErrors,
>>> TaskErrors rather than having a single Errors table. The idea is to avoid
>>> the use of composite ids(with some ids null) and to avoid the filtering
>>> correct type of errors in the code level (for example when retrieving
>>> experiment errors). And also this eases the data retrieval in JPA level. I
>>> have used the same concept for Statuses  and Inputs and Outputs tables.
>>> 3. Since there are some performance issues in PGA related operations in
>>> retrieving experiment related data I created a view called
>>> experiment_summaries which underneath joins several tables and gives the
>>> required data in one view. We can create a JPA model for this view and use
>>> it for PGA related (including some of the Admin Dashboard) operations. I
>>> hope this will solve the issue.
>>> I have attached the schema diagram here with. Please check it and let me
>>> know if anything is wrong, needs to be changed or improved.
>>> If things look good, as the next step I would like to suggest that we
>>> brainstorm different queries that we will run on this data and check
>>> whether the data model can support those queries and the expected
>>> performance.
>>> Thanks
>>> Supun
>>> On Fri, Jun 12, 2015 at 6:39 PM, Suresh Marru <> wrote:
>>>> Hi All,
>>>> With the experience of adapting thrift data models for Airavata in past
>>>> couple of years, its time for us to revisit them. Most persistent criticism
>>>> has been the data models have been complex. Next the data models and
>>>> architecture evolved in parallel and the implementations did not always
>>>> match the intended models. In an effort to address these issues, lets first
>>>> discuss the minimal required data models.
>>>> We need to confirm the models to the general principle of Experiments
>>>> deriving into a Process or a Workflow. For single application, a process
>>>> can be directly derived from Experiment Details. For workflows, multiple
>>>> process are created. Executing a process leads to creation of multiple
>>>> Tasks. Task is a general type which are enacted at run time based on a
>>>> generic execution sequence of environment setup, data input staging,
>>>> application execution and monitoring, data output staging and environment
>>>> cleanup.
>>>> Please review the initial draft:
>>>> Assume lazy consensus and update the models, lets literately review and
>>>> update these thrift IDL’s. We don’t yet need to dive into code generation,
>>>> until these are close to final.
>>>> @Supun, may be you can start thinking on the data base representation
>>>> on these models and assume the details will change but the general
>>>> structure might remain.
>>>> Cheers,
>>>> Suresh
>>> --
>>> Thank you
>>> Supun Nakandala
>>> Dept. Computer Science and Engineering
>>> University of Moratuwa
> --
> Thank you
> Supun Nakandala
> Dept. Computer Science and Engineering
> University of Moratuwa

View raw message