incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <>
Subject Re: [VOTE] Kafka to join the Incubator
Date Tue, 28 Jun 2011 17:32:02 GMT
+1 (binding).



On Jun 28, 2011, at 10:00 AM, Jun Rao wrote:

> Hi all,
> Since the discussion on the thread of the Kafka incubator proposal is
> winding down, I'd like to call a vote.
> At the end of this mail, I've put a copy of the current proposal.  Here is
> a link to the document in the wiki:
> And here is a link to the discussion thread:
> Please cast your votes:
> [  ] +1 Accept Kafka for incubation
> [  ] +0 Indifferent to Kafka incubation
> [  ]  -1 Reject Kafka for incubation
> This vote will close 72 hours from now.
> Thanks,
> Jun
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large amounts
> of streaming data.
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers, partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
> Kafka fills the gap between messaging systems such as Apache ActiveMQ, which
> provide low latency message delivery but don't focus on throughput, and log
> processing systems such as Scribe and Flume, which do not provide adequate
> latency for our diverse set of consumers.  Kafka can also be inserted into
> traditional log-processing systems, acting as an intermediate step before
> further processing. Kafka focuses relentlessly on performance and throughput
> by not introspecting into message content, nor indexing them on the broker.
> We also achieve high performance by depending on Java's sendFile/transferTo
> capabilities to minimize intermediate buffer copies and relying on the OS's
> pagecache to efficiently serve up message contents to consumers. Kafka is
> also designed to be scalable and it depends on Apache ZooKeeper for
> coordination amongst its producers, brokers and consumers.
> Kafka is written in Scala. It was developed internally at LinkedIn to meet
> our particular use cases, but will be useful to many organizations facing a
> similar need to reliably process large amounts of streaming data.
> Therefore, we would like to share it the ASF and begin developing a
> community of developers and users within Apache.
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system such
> as Kafka.  While our use case of processing events from a very large website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the website
> and presentations given to user groups and technical audiences.  We have had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the project
> to make them committers.
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
> To further this goal, we use GitHub issue tracking and branching facilities,
> as well as maintaining a public mailing list via Google Groups.
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
> LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
> Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and Alignment
> sections.
> == Documentation ==
> Information about Kafka can be found at [] The
> following links provide more information about the project:
> * Kafka roadmap and goals: []
> * The GitHub site: []
> * Kafka overview from Jay Kreps: [
> * Kafka overview from Jakob Homan: []
> * Kafka paper at NetDB 2011: [
> ]
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at []
> Kafka is mainly written in Scala with some performance testing code in Java.
> Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
> == Cryptography ==
> Not applicable.
> == Required Resources ==
> === Mailing Lists ===
> * kafka-private for private PMC discussions (with moderated subscriptions)
> * kafka-dev
> * kafka-commits
> * kafka-user
> === Subversion Directory ===
> []
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
> == Initial Committers ==
> * Jay Kreps
> * Jun Rao
> * Neha Narkhede
> * Jakob Homan
> * Phillip Rhodes
> * Henry Saputra
> * Chris Burroughs
> == Affiliations ==
> * Jay Kreps (LinkedIn)
> * Jun Rao (LinkedIn)
> * Neha Narkhede (LinkedIn)
> * Jakob Homan (LinkedIn)
> * Phillip Rhodes (Fogbeam Labs)
> * Henry Saputra (Cisco Systems)
> * Chris Burroughs (Clearspring Technologies)
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
> === Nominated Mentors ===
> * Alan Cabrera (Apache Member)
> * Geir Magnusson, Jr. (Apache Member and Director)
> * Owen O'Malley (Apache Member)
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.

Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message