airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christie, Marcus Aaron" <>
Subject Re: [GSoC Plan of Attack] Choosing Apache Spark
Date Mon, 22 May 2017 14:11:18 GMT

Looks like you are making progress, which is great. However, I’m not quite sure what problem
you are trying to solve. Is there a writeup or something on the problem you are trying to



On May 17, 2017, at 9:13 PM, Apoorv Palkar <<>>

Hey Dev,

I have started my GSoC here @ Indiana University. I have chosen to investigate Spark over
Storm/Flink for our distributed model. This is because Storm/Flink are generally more better
suited for live event streaming. We are analyzing the batch processing case first and then
potentially considering live streaming. Spark is best suited for this because it allows for
batch processing through the core engine and live processing through the Spark Streaming library.
Over the past 4 days I configured the Spark standalone cluster manager to work with worker
node virtual machines on AWS EC2. As Amazon was paid, we have decided to switch to the JetStream/OpenStack
API. As of now, I am using Spark Standalone for the cluster manager between the core engine
and workers. In addition to this, I'm investigating the use of Mesos/Yarn via Hadoop for future
Airavata cluster managers.

Any suggestions would be good.

Apoorv Palkar

View raw message