airavata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [airavata-data-lake] 01/01: bootstrapping git repo
Date Tue, 10 Nov 2020 18:02:41 GMT
This is an automated email from the ASF dual-hosted git repository.

smarru pushed a commit to branch master
in repository

commit 3ed323a81d93a2da0cad1e748bc6e825102cb06d
Author: Suresh Marru <>
AuthorDate: Tue Nov 10 13:02:25 2020 -0500

    bootstrapping git repo
--- | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/ b/
new file mode 100644
index 0000000..fd06c5a
--- /dev/null
+++ b/
@@ -0,0 +1,35 @@
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+# Apache Airavata Data Lake
+Apache Airavata use cases enable capture of data from observational and experimental instruments
and computations resulting from Airavata's orchestration capabilities. As the data deluges
into vast amounts, harvesting the data, capturing metadata, presenting it for easy and controlled
access becomes unmanageable. 
+Airavata data lake will bundle stand alone services to catalog data in various data stores,
extract and capture semantics and metadata descriptions of the data and preserve associated
data provenance. The data lake will provide API's, query and search capabilities to programmatically
search and retrieve data and power building user interactivity and data analysis applications
on top of it. 
+![Airavata Data Lake Overview](
+Airavata Data Lake will provide file watcher and other trigger capabilities to ingest data
from scientific instruments as they become available. The framwork will enable pluggable data
parsers to read structured and unstructured data files and extract meaningful descriptions.

+A bundled Data replica catalogs will associate pointers to data at multiple storgae locations.
The replica catalog maps logical file names to the physical locations. Data Lake client SDK's
will provide API functions to query replica location and resolve into multiple physical file
locations. The client will be bundled with access protocols to retrive the data or to embedd
into computational pipelines. 
+Interfacing with Airavata [Managed File Transfer Service](
Data can moved and archiving into longer term persistant storages like tapped archives. The
Data archives will be indexed and have search capabilities  
+Data Lake's provenance will provide information to capture parameters influenced the production
or modification of the data. An abstraction API will enable plugging fine granted provenance
based on Airavata tentant context. Interfacing with Airavata Orchestration Services, pointers
to experiment catalog will enable restructuring of the underting computations.
\ No newline at end of file

View raw message