airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Supun Nakandala <supun.nakand...@gmail.com>
Subject Re: [DISCUSS] Data models for 0.16 and beyond
Date Wed, 01 Jul 2015 08:02:52 GMT
As a starting place you can check this class
https://github.com/apache/airavata/blob/master/modules/registry/registry-core/src/main/java/org/apache/airavata/registry/core/experiment/catalog/impl/ExperimentCatalogImpl.java


On Wed, Jul 1, 2015 at 8:59 AM, John Weachock <jweachock@gmail.com> wrote:

> Excellent! Do you have any pointers to where I can look to start reading
> the code and begin adding the necessary features to the registry?
>
> Thanks!
>
> On Tue, Jun 30, 2015 at 5:05 AM, Supun Nakandala <
> supun.nakandala@gmail.com> wrote:
>
>> Hi John,
>>
>> Even though in the current thrift models has only one status entry, in
>> the database we maintain all the state transitions (i.e all the status
>> entries). But when retrieving an experiment, process, task, or job only the
>> latest status is returned based on the creation time stamp. So at the
>> registry level we can support your requirement. What is required is the
>> required thrift models to transfer those data via the APIs/CPIs.
>>
>>
>> On Tue, Jun 30, 2015 at 1:28 PM, John Weachock <jweachock@gmail.com>
>> wrote:
>>
>>> Hi Supun,
>>>
>>> Sorry for sending this message so late!
>>>
>>> Last week I discussed a change to the data models with Suresh regarding
>>> task / job / experiment / etc statuses. Currently, each item has a single
>>> status ID that points to a status that's updated every change. However, if
>>> each item contained a *list* of status IDs, and each status change
>>> created a *new* status entry, we can record data about experiment run
>>> times, which could be used in future versions to assist in benchmark and
>>> runtime prediction efforts. Additionally, users could be provided the
>>> information about the progression of their experiment.
>>>
>>> Thanks,
>>>
>>> John
>>>
>>>
>>> On Sun, Jun 14, 2015 at 1:56 PM, Supun Nakandala <
>>> supun.nakandala@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I came up with the initial version of the schema for the new experiment
>>>> catalog. It is very much similar to the existing model and have few changes
>>>>
>>>> 1. In Experiments I have used one text field for email addresses with
>>>> the intention of storing comma separated email list. The idea was to avoid
>>>> another DB table join. And also in Errors tables I have used a single text
>>>> field for storing parent error ids with the same intention.
>>>>
>>>> 2. I have used separate tables for ExperimentErrors, ProcessErrors,
>>>> TaskErrors rather than having a single Errors table. The idea is to avoid
>>>> the use of composite ids(with some ids null) and to avoid the filtering
>>>> correct type of errors in the code level (for example when retrieving
>>>> experiment errors). And also this eases the data retrieval in JPA level.
I
>>>> have used the same concept for Statuses  and Inputs and Outputs tables.
>>>>
>>>> 3. Since there are some performance issues in PGA related operations in
>>>> retrieving experiment related data I created a view called
>>>> experiment_summaries which underneath joins several tables and gives the
>>>> required data in one view. We can create a JPA model for this view and use
>>>> it for PGA related (including some of the Admin Dashboard) operations. I
>>>> hope this will solve the issue.
>>>>
>>>> I have attached the schema diagram here with. Please check it and let
>>>> me know if anything is wrong, needs to be changed or improved.
>>>>
>>>> If things look good, as the next step I would like to suggest that we
>>>> brainstorm different queries that we will run on this data and check
>>>> whether the data model can support those queries and the expected
>>>> performance.
>>>>
>>>> Thanks
>>>> Supun
>>>>
>>>> On Fri, Jun 12, 2015 at 6:39 PM, Suresh Marru <smarru@apache.org>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> With the experience of adapting thrift data models for Airavata in
>>>>> past couple of years, its time for us to revisit them. Most persistent
>>>>> criticism has been the data models have been complex. Next the data models
>>>>> and architecture evolved in parallel and the implementations did not
always
>>>>> match the intended models. In an effort to address these issues, lets
first
>>>>> discuss the minimal required data models.
>>>>>
>>>>> We need to confirm the models to the general principle of Experiments
>>>>> deriving into a Process or a Workflow. For single application, a process
>>>>> can be directly derived from Experiment Details. For workflows, multiple
>>>>> process are created. Executing a process leads to creation of multiple
>>>>> Tasks. Task is a general type which are enacted at run time based on
a
>>>>> generic execution sequence of environment setup, data input staging,
>>>>> application execution and monitoring, data output staging and environment
>>>>> cleanup.
>>>>>
>>>>> Please review the initial draft:
>>>>>
>>>>> https://github.com/apache/airavata/tree/master/thrift-interface-descriptions/airavata-data-models
>>>>>
>>>>> Assume lazy consensus and update the models, lets literately review
>>>>> and update these thrift IDL’s. We don’t yet need to dive into code
>>>>> generation, until these are close to final.
>>>>>
>>>>> @Supun, may be you can start thinking on the data base representation
>>>>> on these models and assume the details will change but the general
>>>>> structure might remain.
>>>>>
>>>>> Cheers,
>>>>> Suresh
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thank you
>>>> Supun Nakandala
>>>> Dept. Computer Science and Engineering
>>>> University of Moratuwa
>>>>
>>>
>>>
>>
>>
>> --
>> Thank you
>> Supun Nakandala
>> Dept. Computer Science and Engineering
>> University of Moratuwa
>>
>
>


-- 
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa

Mime
View raw message