incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: [PROPOSAL] Kafka for the Apache Incubator
Date Thu, 23 Jun 2011 15:07:19 GMT
Wow, very nice proposal guys!
Tommaso

2011/6/22 Jun Rao <junrao@gmail.com>

> Hi,
>
> I would like to propose Kafka to be an Apache Incubator project.  Kafka is
> a
> distributed, high throughput, publish-subscribe system for processing large
> amounts of streaming data.
>
> Here's a link to the proposal in the Incubator wiki
> http://wiki.apache.org/incubator/KafkaProposal
>
> I've also pasted the initial contents below.
>
> Thanks,
>
> Jun
>
> == Abstract ==
> Kafka is a distributed publish-subscribe system for processing large
> amounts
> of streaming data.
>
> == Proposal ==
> Kafka provides an extremely high throughput distributed publish/subscribe
> messaging system.  Additionally, it supports relatively long term
> persistence of messages to support a wide variety of consumers,
> partitioning
> of the message stream across servers and consumers, and functionality for
> loading data into Apache Hadoop for offline, batch processing.
>
> == Background ==
> Kafka was developed at LinkedIn to process the large amounts of events
> generated by that company's website and provide a common repository for
> many
> types of consumers to access and process those events. Kafka has been used
> in production at LinkedIn scale to handle dozens of types of events
> including page views, searches and social network activity. Kafka clusters
> at LinkedIn currently process more than two billion events per day.
>
> Kafka fills the gap between messaging systems such as Apache ActiveMQ,
> which
> can provide high-volume messaging systems but lack persistence of those
> messages, and log processing systems such as Scribe and Flume, which do not
> provide adequate latency for our diverse set of consumers.  Kafka can also
> be inserted into traditional log-processing systems, acting as an
> intermediate step before further processing. Kafka focuses relentlessly on
> performance and throughput by not introspecting into message content, nor
> indexing them on the broker.  We also achieve high performance by depending
> on Java's sendFile/transferTo capabilities to minimize intermediate buffer
> copies and relying on the OS's pagecache to efficiently serve up message
> contents to consumers.
>
> Kafka is written in Scala and depends on Apache ZooKeeper for coordination
> amongst its producers, brokers and consumers.
>
> Kafka was developed internally at LinkedIn to meet our particular use
> cases,
> but will be useful to many organizations facing a similar need to reliably
> process large amounts of streaming data.  Therefore, we would like to share
> it the ASF and begin developing a community of developers and users within
> Apache.
>
> == Rationale ==
> Many organizations can benefit from a reliable stream processing system
> such
> as Kafka.  While our use case of processing events from a very large
> website
> like LinkedIn has driven the design of Kafka, its uses are varied and we
> expect many new use cases to emerge.  Kafka provides a natural bridge
> between near real-time event processing and offline batch processing and
> will appeal to many users.
>
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Kafka following the Apache meritocracy model.
> Since Kafka was open sourced we have solicited contributions via the
> website
> and presentations given to user groups and technical audiences.  We have
> had
> positive responses to these and have received several contributions and
> clients for other languages.  We plan to continue this support for new
> contributors and work with those who contribute significantly to the
> project
> to make them committers.
>
> === Community ===
> Kafka is currently being used by developed by engineers within LinkedIn and
> used in production in that company. Additionally, we have active users in
> or
> have received contributions from a diverse set of companies including
> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> presentations of Kafka and its goals garnered much interest from potential
> contributors. We hope to extend our contributor base significantly and
> invite all those who are interested in building high-throughput distributed
> systems to participate.  We have begun receiving contributions from outside
> of LinkedIn, including clients for several languages including Ruby, PHP,
> Clojure, .NET and Python.
>
> To further this goal, we use GitHub issue tracking and branching
> facilities,
> as well as maintaining a public mailing list via Google Groups.
>
> === Core Developers ===
> Kafka is currently being developed by four engineers at LinkedIn: Neha
> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> Apache as a Cassandra committer and PMC member. Neha has been an active
> contributor to several projects LinkedIn has open sourced, including Bobo,
> Sensei and Zoie. Jay has experience with open source software as the
> originator of the Project Voldemort project, as well as being active within
> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
> and previous Apache ZooKeeper contributor.
>
> === Alignment ===
> The ASF is the natural choice to host the Kafka project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Kafka.  Additionally, many other projects with which we are familiar with
> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
>
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
> internal infrastructure and is in production use.
>
> === Inexperience with Open Source ===
> All of the core developers have experience with open source development.
>  LinkedIn open sourced Kafka several months ago and has been receiving
> contributions since.  Jun is an Apache Cassandra committer and PMC member.
>  Jay and Neha have been involved with several open source projects released
> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
> Hadoop committer and PMC member.
>
> === Homogeneous Developers ===
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Kafka.
>
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Kafka. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Kafka internally, the reliance on salaried
> developers is unlikely to change.
>
> === Relationships with Other Apache Products ===
> Kafka is deeply integrated with Apache products. Kafka uses Apache
> ZooKeeper
> to coordinate its state amongst the brokers, consumers, and soon, the
> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
> data directly from Kafka.  Kafka provides an appender to allow consuming
> data directly from Apache log4j.
>
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Kafka a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and
> Alignment
> sections.
>
> == Documentation ==
> Information about Kafka can be found at [http://sna-projects.com/kafka/]
> The
> following links provide more information about the project:
>
>  * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>  * The GitHub site: [https://github.com/kafka-dev/kafka]
>  * Kafka overview from Jay Kreps: [
> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>  * Kafka paper at NetDB 2011: [
>
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> ]
>
> == Initial Source ==
> Kafka has been under development at LinkedIn since November 2009.  It was
> open sourced by LinkedIn in January 2011.  It is currently hosted on github
> under the Apache license at [https://github.com/kafka-dev/kafka]
>
> Kafka is mainly written in Scala with some performance testing code in
> Java.
>  Several clients have been contributed in other languages, including Ruby,
> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
> and relies of simple build tool (sbt) as its build system and dependency
> resolution mechanism.
>
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
>
> == Cryptography ==
> Not applicable.
>
> == Required Resources ==
> === Mailing Lists ===
>  * kafka-private for private PMC discussions (with moderated subscriptions)
>  * kafka-dev   * kafka-commits   * kafka-user
>
> === Subversion Directory ===
> [https://svn.apache.org/repos/asf/incubator/kafka]
>
> === Issue Tracking ===
> JIRA Kafka (KAFKA)
>
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson
> instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
>
> == Initial Committers ==
>  * Jay Kreps
>  * Jun Rao
>  * Neha Narkhede
>  * Jakob Homan
>
> == Affiliations ==
>  * Jay Kreps (LinkedIn)
>  * Jun Rao (LinkedIn)
>  * Neha Narkhede (LinkedIn)
>  * Jakob Homan (LinkedIn)
>
> == Sponsors ==
> === Champion ===
> Chris Douglas (Apache Member)
>
> === Nominated Mentors ===
>  * Alan Cabrera (Apache Member)
>  * Geir Magnusson, Jr. (Apache Member and Director)
>  * Owen O'Malley (Apache Member)
>
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message