kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neha Narkhede (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-1754) KOYA - Kafka on YARN
Date Mon, 10 Nov 2014 03:00:39 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204267#comment-14204267

Neha Narkhede commented on KAFKA-1754:

bq. Streaming applications such as Samza, SparkStreaming and DataTorrents will benefit from
running their workers on the same nodes as the partitions they are consuming data from. This
is now possible in YARN.

[~gwenshap] We tried to deploy Samza and try to co-locate Kafka partitions on the same box.
The main problem was of I/O and memory resource isolation. Page cache issues between stateful
jobs (that need to write to the local k/v store) and the Kafka brokers. Plus, Kafka's partitioning
style doesn't lend itself to locality (writes go to arbitrary partitions (boxes) based on
key, and reads are spread across partitions on many boxes). This issue of resource isolation
is not just a problem with something like Samza but will be an issue with running Kafka with
any other I/O heavy application on YARN.

> KOYA - Kafka on YARN
> --------------------
>                 Key: KAFKA-1754
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1754
>             Project: Kafka
>          Issue Type: New Feature
>            Reporter: Thomas Weise
>         Attachments: DT-KOYA-Proposal- JIRA.pdf
> YARN (Hadoop 2.x) has enabled clusters to be used for a variety of workloads, emerging
as distributed operating system for big data applications. Initiatives are on the way to bring
long running services under the YARN umbrella, leveraging it for centralized resource management
and operations ([YARN-896] and examples such as HBase, Accumulo or Memcached through Slider).
This JIRA is to propose KOYA (Kafka On Yarn), a YARN application master to launch and manage
Kafka clusters running on YARN. Brokers will use resources allocated through YARN with support
for recovery, monitoring etc. Please see attached for more details.

This message was sent by Atlassian JIRA

View raw message