incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Hirsch <hirsch.d...@gmail.com>
Subject Re: [VOTE] Kafka to join the Incubator
Date Wed, 29 Jun 2011 11:17:53 GMT
+1 (Binding)

By the way, it is great to see another Scala-based project coming to Apache.

Dick

VP Apache ESME

On Wed, Jun 29, 2011 at 12:35 PM, Tommaso Teofili
<tommaso.teofili@gmail.com> wrote:
> +1 (binding)
> Tommaso
>
> 2011/6/28 Jun Rao <junrao@gmail.com>
>
>> Hi all,
>>
>>
>> Since the discussion on the thread of the Kafka incubator proposal is
>> winding down, I'd like to call a vote.
>>
>> At the end of this mail, I've put a copy of the current proposal.  Here is
>> a link to the document in the wiki:
>> http://wiki.apache.org/incubator/KafkaProposal
>>
>> And here is a link to the discussion thread:
>> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
>>
>> Please cast your votes:
>>
>> [  ] +1 Accept Kafka for incubation
>> [  ] +0 Indifferent to Kafka incubation
>> [  ]  -1 Reject Kafka for incubation
>>
>> This vote will close 72 hours from now.
>>
>> Thanks,
>>
>> Jun
>>
>> == Abstract ==
>> Kafka is a distributed publish-subscribe system for processing large
>> amounts
>> of streaming data.
>>
>> == Proposal ==
>> Kafka provides an extremely high throughput distributed publish/subscribe
>> messaging system.  Additionally, it supports relatively long term
>> persistence of messages to support a wide variety of consumers,
>> partitioning
>> of the message stream across servers and consumers, and functionality for
>> loading data into Apache Hadoop for offline, batch processing.
>>
>> == Background ==
>> Kafka was developed at LinkedIn to process the large amounts of events
>> generated by that company's website and provide a common repository for
>> many
>> types of consumers to access and process those events. Kafka has been used
>> in production at LinkedIn scale to handle dozens of types of events
>> including page views, searches and social network activity. Kafka clusters
>> at LinkedIn currently process more than two billion events per day.
>>
>> Kafka fills the gap between messaging systems such as Apache ActiveMQ,
>> which
>> provide low latency message delivery but don't focus on throughput, and log
>> processing systems such as Scribe and Flume, which do not provide adequate
>> latency for our diverse set of consumers.  Kafka can also be inserted into
>> traditional log-processing systems, acting as an intermediate step before
>> further processing. Kafka focuses relentlessly on performance and
>> throughput
>> by not introspecting into message content, nor indexing them on the broker.
>>  We also achieve high performance by depending on Java's
>> sendFile/transferTo
>> capabilities to minimize intermediate buffer copies and relying on the OS's
>> pagecache to efficiently serve up message contents to consumers. Kafka is
>> also designed to be scalable and it depends on Apache ZooKeeper for
>> coordination amongst its producers, brokers and consumers.
>>
>> Kafka is written in Scala. It was developed internally at LinkedIn to meet
>> our particular use cases, but will be useful to many organizations facing a
>> similar need to reliably process large amounts of streaming data.
>>  Therefore, we would like to share it the ASF and begin developing a
>> community of developers and users within Apache.
>>
>> == Rationale ==
>> Many organizations can benefit from a reliable stream processing system
>> such
>> as Kafka.  While our use case of processing events from a very large
>> website
>> like LinkedIn has driven the design of Kafka, its uses are varied and we
>> expect many new use cases to emerge.  Kafka provides a natural bridge
>> between near real-time event processing and offline batch processing and
>> will appeal to many users.
>>
>> == Current Status ==
>> === Meritocracy ===
>> Our intent with this incubator proposal is to start building a diverse
>> developer community around Kafka following the Apache meritocracy model.
>> Since Kafka was open sourced we have solicited contributions via the
>> website
>> and presentations given to user groups and technical audiences.  We have
>> had
>> positive responses to these and have received several contributions and
>> clients for other languages.  We plan to continue this support for new
>> contributors and work with those who contribute significantly to the
>> project
>> to make them committers.
>>
>> === Community ===
>> Kafka is currently being used by developed by engineers within LinkedIn and
>> used in production in that company. Additionally, we have active users in
>> or
>> have received contributions from a diverse set of companies including
>> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
>> presentations of Kafka and its goals garnered much interest from potential
>> contributors. We hope to extend our contributor base significantly and
>> invite all those who are interested in building high-throughput distributed
>> systems to participate.  We have begun receiving contributions from outside
>> of LinkedIn, including clients for several languages including Ruby, PHP,
>> Clojure, .NET and Python.
>>
>> To further this goal, we use GitHub issue tracking and branching
>> facilities,
>> as well as maintaining a public mailing list via Google Groups.
>>
>> === Core Developers ===
>> Kafka is currently being developed by four engineers at LinkedIn: Neha
>> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
>> Apache as a Cassandra committer and PMC member. Neha has been an active
>> contributor to several projects LinkedIn has open sourced, including Bobo,
>> Sensei and Zoie. Jay has experience with open source software as the
>> originator of the Project Voldemort project, as well as being active within
>> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC
>> and previous Apache ZooKeeper contributor.
>>
>> === Alignment ===
>> The ASF is the natural choice to host the Kafka project as its goal of
>> encouraging community-driven open-source projects fits with our vision for
>> Kafka.  Additionally, many other projects with which we are familiar with
>> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
>> and log4j are hosted by the ASF and we will benefit and provide benefit by
>> close proximity to them.
>>
>> == Known Risks ==
>> === Orphaned Products ===
>> The core developers plan to work full time on the project. There is very
>> little risk of Kafka being abandoned as it is a critical part of LinkedIn's
>> internal infrastructure and is in production use.
>>
>> === Inexperience with Open Source ===
>> All of the core developers have experience with open source development.
>>  LinkedIn open sourced Kafka several months ago and has been receiving
>> contributions since.  Jun is an Apache Cassandra committer and PMC member.
>>  Jay and Neha have been involved with several open source projects released
>> by LinkedIn.  Jakob has been actively involved with the ASF as a full-time
>> Hadoop committer and PMC member.
>>
>> === Homogeneous Developers ===
>> The current core developers are all from LinkedIn. However, we hope to
>> establish a developer community that includes contributors from several
>> corporations and we actively encouraging new contributors via the mailing
>> lists and public presentations of Kafka.
>>
>> === Reliance on Salaried Developers ===
>> Currently, the developers are paid to do work on Kafka. However, once the
>> project has a community built around it, we expect to get committers,
>> developers and community from outside the current core developers. However,
>> because LinkedIn relies on Kafka internally, the reliance on salaried
>> developers is unlikely to change.
>>
>> === Relationships with Other Apache Products ===
>> Kafka is deeply integrated with Apache products. Kafka uses Apache
>> ZooKeeper
>> to coordinate its state amongst the brokers, consumers, and soon, the
>> producers.  Kafka provides input formats to allow Hadoop MapReduce to load
>> data directly from Kafka.  Kafka provides an appender to allow consuming
>> data directly from Apache log4j.
>>
>> === An Excessive Fascination with the Apache Brand ===
>> While we respect the reputation of the Apache brand and have no doubts that
>> it will attract contributors and users, our interest is primarily to give
>> Kafka a solid home as an open source project following an established
>> development model. We have also given reasons in the Rationale and
>> Alignment
>> sections.
>>
>> == Documentation ==
>> Information about Kafka can be found at [http://sna-projects.com/kafka/]
>> The
>> following links provide more information about the project:
>>
>>  * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>>  * The GitHub site: [https://github.com/kafka-dev/kafka]
>>  * Kafka overview from Jay Kreps: [
>> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>>  * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>>  * Kafka paper at NetDB 2011: [
>>
>> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
>> ]
>>
>> == Initial Source ==
>> Kafka has been under development at LinkedIn since November 2009.  It was
>> open sourced by LinkedIn in January 2011.  It is currently hosted on github
>> under the Apache license at [https://github.com/kafka-dev/kafka]
>>
>> Kafka is mainly written in Scala with some performance testing code in
>> Java.
>>  Several clients have been contributed in other languages, including Ruby,
>> PHP, Clojure, .NET and Python.  Its source tree is entirely self contained
>> and relies of simple build tool (sbt) as its build system and dependency
>> resolution mechanism.
>>
>> == External Dependencies ==
>> The dependencies all have Apache compatible licenses.
>>
>> == Cryptography ==
>> Not applicable.
>>
>> == Required Resources ==
>> === Mailing Lists ===
>>  * kafka-private for private PMC discussions (with moderated subscriptions)
>>  * kafka-dev
>>  * kafka-commits
>>  * kafka-user
>>
>> === Subversion Directory ===
>> [https://svn.apache.org/repos/asf/incubator/kafka]
>>
>> === Issue Tracking ===
>> JIRA Kafka (KAFKA)
>>
>> === Other Resources ===
>> The existing code already has unit tests, so we would like a Hudson
>> instance
>> to run them whenever a new patch is submitted. This can be added after
>> project creation.
>>
>> == Initial Committers ==
>>  * Jay Kreps
>>  * Jun Rao
>>  * Neha Narkhede
>>  * Jakob Homan
>>  * Phillip Rhodes
>>  * Henry Saputra
>>  * Chris Burroughs
>>
>> == Affiliations ==
>>  * Jay Kreps (LinkedIn)
>>  * Jun Rao (LinkedIn)
>>  * Neha Narkhede (LinkedIn)
>>  * Jakob Homan (LinkedIn)
>>  * Phillip Rhodes (Fogbeam Labs)
>>  * Henry Saputra (Cisco Systems)
>>  * Chris Burroughs (Clearspring Technologies)
>>
>> == Sponsors ==
>> === Champion ===
>> Chris Douglas (Apache Member)
>>
>> === Nominated Mentors ===
>>  * Alan Cabrera (Apache Member)
>>  * Geir Magnusson, Jr. (Apache Member and Director)
>>  * Owen O'Malley (Apache Member)
>>
>> === Sponsoring Entity ===
>> We are requesting the Incubator to sponsor this project.
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message