airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apoorv Palkar <>
Subject [GSoC Plan of Attack] Choosing Apache Spark
Date Thu, 18 May 2017 01:13:19 GMT
Hey Dev,

I have started my GSoC here @ Indiana University. I have chosen to investigate Spark over
Storm/Flink for our distributed model. This is because Storm/Flink are generally more better
suited for live event streaming. We are analyzing the batch processing case first and then
potentially considering live streaming. Spark is best suited for this because it allows for
batch processing through the core engine and live processing through the Spark Streaming library.
Over the past 4 days I configured the Spark standalone cluster manager to work with worker
node virtual machines on AWS EC2. As Amazon was paid, we have decided to switch to the JetStream/OpenStack
API. As of now, I am using Spark Standalone for the cluster manager between the core engine
and workers. In addition to this, I'm investigating the use of Mesos/Yarn via Hadoop for future
Airavata cluster managers.

Any suggestions would be good.

Apoorv Palkar

View raw message