airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Marru (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRAVATA-1646) [GSoC] Brainstorm Airavata Data Management Needs
Date Sun, 29 Mar 2015 00:18:53 GMT

    [ https://issues.apache.org/jira/browse/AIRAVATA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385578#comment-14385578
] 

Suresh Marru commented on AIRAVATA-1646:
----------------------------------------

Hi Chris,

I fully agree we need to bring this to surface again. There were couple of academic projects
(one undergraduate team, one masters student) who explored using OODT for Airavata data management
needs, need to check on that. Rishi Verma and few of us also discussed some synergies [1].
How about we find time at ApacheCon and make plans so we can pursue either GSoC, or summer
of code or other opportunities to make progress? 

[1] - http://airavata.markmail.org/thread/5uabiaceuj2eqayl

> [GSoC] Brainstorm Airavata Data Management Needs
> ------------------------------------------------
>
>                 Key: AIRAVATA-1646
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-1646
>             Project: Airavata
>          Issue Type: Brainstorming
>            Reporter: Suresh Marru
>              Labels: gsoc, gsoc2015,, mentor
>
> Currently Airavata focuses on Execution Management and the Registry Sub-System (with
app, resource and experiment catalogs) capture metadata about applications and executions.
There were few efforts (primarily from student projects) to explore this void. It will be
good to concretely propose data management solutions to for input data registration, input
and generated retrieval, data transfers and replication management. 
> Metadata Catalog: In addition current metadata management is based on shredding thrift
data models into mysql/derby schema. This is described in [1]. We have discussed extensively
on using Object Store data bases with a conclusion of understanding the requirements more
systematically. A good stand alone task would be to understand current metadata management
and propose alternative solutions with proof of concept implementations. Once the community
is convinced, we can then plan on implementing them into production. 
> Provenance: Airavata could be enhanced to capture provenance to organize the data for
reuse, discovery, comparison and sharing. This is a well explored field. There might be good
compelling third party solutions. Especially it will be good to explore in the bigdata space
and identify leverages (either concepts, or even better implementations).
> Auditing and Traceability:  As Airavata mediates executions on behalf of gateways, it
has to strike a balance between abstracting the compute resource interactions at the same
time providing transparent execution trace. This will bloat the amount of data to be catalogued.
A good effort will be to understand the current extent of airavata audits and provide suggestions.

> BigData Leverage: Airavata needs to leverage the influx of tools in this space. Any suggestions
on relevant tools which will enhance Airavata experience will be a good fit. 
> [1] - https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Data+Models+0.12
> [2] - http://markmail.org/thread/4lguliiktjohjmsd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message