airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Danushka Menikkumbura <danushka.menikkumb...@gmail.com>
Subject Big data challenges in Airavata
Date Tue, 25 Sep 2012 00:10:18 GMT
Hi all,

I am a student of 2012 M.Sc.(CS) batch of University of Moratuwa, Sri
Lanka. Big data is one of the areas that I research and I am currently
looking into possibilities and challenges in bringing in big data
capabilities to science gateways under the supervision of Dr. Shahani
Weerawarana. With the knowledge that I have gathered so far, I understand
that Airavata lacks its strength in this area.

Basically support for big data in Airavata could be in different shapes.

1. Simply make big data techniques available during workflow execution.
This could be in the form of MapReduce (Hadoop), BigTable data models
(Cassandra), etc. The idea is to handle huge data volumes as mentioned in
[1]. (e.g. 700 TB/sec data flood off the SKA [2] in near future).

2. Using a big-data-ready distributed filesystem as the core filesystem of
Airavata (e.g. HDFS) and make is available across the framework.

3. Challenges related to data provenance [3], [4].

I believe you see things better when you look at Airavata from these
perspectives and maybe you have already put thoughts into these aspects.

Please share your thoughts and help me understand what I should actually
look into.

[1] - http://www.slideshare.net/Hadoop_Summit/big-data-challenges-at-nasa
[2] - http://en.wikipedia.org/wiki/Square_Kilometre_Array
[3] - http://rac.uits.iu.edu/sites/default/files/SimmhanICWS06.pdf
[4] - http://bit.ly/PC2Eq4

Thanks,
Danushka

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message