openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samuel Hjelmfelt <>
Subject Re: PR to enable actions on YARN
Date Sat, 23 Feb 2019 01:20:51 GMT

Hi Rodric and Carlos,

ApacheHadoop has three major components: HDFS (distributed filesystem), MapReduce(distributed
batch processing engine), YARN (Yet Another Resource Negotiator) (containerengine). While
MapReduce has been largely replaced by Apache Tez, Apache Spark,and Apache Flink, HDFS and
YARN are still widely used for data analytics use cases. 

YARN is unique as a container engine because, unlike Mesos and Kubernetes, it was designed
for ephemeral, short-livedcontainers rather than for long running micro-services. The jobs
and queries that run on YARN are split intosmall tasks that run to completion and generally
only last for seconds or maybe minutes. Overthe last couple years, YARN has been expanding
its support for long running usecases, but is still focused on data-driven use cases over
more generic micro-serviceuse cases (like web apps). The primary long running technologies
on YARN are currently Spark Streamingand TensorFlow. Here is an articlefrom LinkedIn about
why they created a project for TensorFlow on YARN. Asimilar case could be made for OpenWhisk:

Bringing OpenWhisk onto YARN makes FaaS more accessible to thethousands of organizations with
existing Hadoop clusters. Between Cloudera’s 2,000+ customers; Azure, AWS,and GCP cloud
customers; and the organizations self-supporting like Netflix, theinstall base of YARN is
very high and still growing.


ThisPR is a first level of integration, but YARN’s focus on ephemeral containerscould be
more fully leveraged by OpenWhisk to improve scalability andperformance. Here is an interesting
article on the scalability of YARN fromMicrosoft:

Sam Hjelmfelt

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message