reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashvin A <>
Subject Introducing Heron on REEF + YARN
Date Fri, 01 Jul 2016 00:43:54 GMT

I recently used REEF to enable Heron [1] deployment on YARN clusters. I
wanted to share some information about the work with this community.

Heron is a successor of Apache Storm (stream processing), recently open
sourced by Twitter [2]. Initially Heron supported Aurora and Mesos clusters
only. My goal was to add YARN cluster support. I leverage the REEF
framework to develop a custom Heron scheduler [3] for YARN.

Heron executes user workflows, called topologies [4], to process data
streams. Each topology is a long running service, which would become a long
running YARN job. The topology needs to be restarted whenever a user
updates the topology code. With these initial requirements I tried Slider,
YARN API and REEF for implementing this scheduler.

I found the REEF framework to be addressing the requirements very well.
Particularly the ability to retain container when topology is restarted
significantly reduced down time. The user creates a topology package and it
seemed unnecessary to create another package layer as required by Slider.
REEF's event model for exception and failure management was also very
useful. I was able to get the initial version running quickly.

The YARN scheduler is still evolving. Currently the YARN AM HA is the
biggest missing pease. I have created a user doc [5] for deployment on a
YARN cluster in case you are interested in trying it.

I am very impressed with the REEF framework and am looking forward to
participate in the community.


Disclaimer: I am working at Microsoft


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message