Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 377F66FFD for ; Tue, 28 Jun 2011 17:57:39 +0000 (UTC) Received: (qmail 85653 invoked by uid 500); 28 Jun 2011 17:57:38 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 85450 invoked by uid 500); 28 Jun 2011 17:57:37 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 85442 invoked by uid 99); 28 Jun 2011 17:57:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 17:57:37 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,T_URIBL_SEM,T_URIBL_SEM_RED X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of joeandrewkey@gmail.com designates 209.85.218.47 as permitted sender) Received: from [209.85.218.47] (HELO mail-yi0-f47.google.com) (209.85.218.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 17:57:31 +0000 Received: by yib18 with SMTP id 18so198825yib.6 for ; Tue, 28 Jun 2011 10:57:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=xxc8Zj+eGEPidnzN2IpWiswDGTRrVS+nSINPV/ddPYI=; b=ErlUCWYr8bgSlf6hflMBFm2hg8SazwpA0nLucPfp9wzpNLJMG6R3HGpKQNp/Wf2D+S 5XLx0pWDaWXItIpG2dOZg0IlTUfkXXWiHzL98iEksOCqp1Oqxe2knZgKznYanLnLoM0R CoOeiIUKd+QSMOPezsDHJOF1w/VrbK5g2LFEI= MIME-Version: 1.0 Received: by 10.151.130.2 with SMTP id h2mr4228040ybn.171.1309283830415; Tue, 28 Jun 2011 10:57:10 -0700 (PDT) Received: by 10.150.204.8 with HTTP; Tue, 28 Jun 2011 10:57:10 -0700 (PDT) In-Reply-To: References: Date: Tue, 28 Jun 2011 10:57:10 -0700 Message-ID: Subject: Re: [VOTE] Kafka to join the Incubator From: Joe Key To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=00504502b2d19f437504a6c9650e X-Virus-Checked: Checked by ClamAV on apache.org --00504502b2d19f437504a6c9650e Content-Type: text/plain; charset=ISO-8859-1 +1 Sincerely, J. Andrew Key (Andy) On Tue, Jun 28, 2011 at 10:32 AM, Mattmann, Chris A (388J) < chris.a.mattmann@jpl.nasa.gov> wrote: > +1 (binding). > > Thanks! > > Cheers, > Chris > > On Jun 28, 2011, at 10:00 AM, Jun Rao wrote: > > > Hi all, > > > > > > Since the discussion on the thread of the Kafka incubator proposal is > > winding down, I'd like to call a vote. > > > > At the end of this mail, I've put a copy of the current proposal. Here > is > > a link to the document in the wiki: > > http://wiki.apache.org/incubator/KafkaProposal > > > > And here is a link to the discussion thread: > > http://www.mail-archive.com/general@incubator.apache.org/msg29594.html > > > > Please cast your votes: > > > > [ ] +1 Accept Kafka for incubation > > [ ] +0 Indifferent to Kafka incubation > > [ ] -1 Reject Kafka for incubation > > > > This vote will close 72 hours from now. > > > > Thanks, > > > > Jun > > > > == Abstract == > > Kafka is a distributed publish-subscribe system for processing large > amounts > > of streaming data. > > > > == Proposal == > > Kafka provides an extremely high throughput distributed publish/subscribe > > messaging system. Additionally, it supports relatively long term > > persistence of messages to support a wide variety of consumers, > partitioning > > of the message stream across servers and consumers, and functionality for > > loading data into Apache Hadoop for offline, batch processing. > > > > == Background == > > Kafka was developed at LinkedIn to process the large amounts of events > > generated by that company's website and provide a common repository for > many > > types of consumers to access and process those events. Kafka has been > used > > in production at LinkedIn scale to handle dozens of types of events > > including page views, searches and social network activity. Kafka > clusters > > at LinkedIn currently process more than two billion events per day. > > > > Kafka fills the gap between messaging systems such as Apache ActiveMQ, > which > > provide low latency message delivery but don't focus on throughput, and > log > > processing systems such as Scribe and Flume, which do not provide > adequate > > latency for our diverse set of consumers. Kafka can also be inserted > into > > traditional log-processing systems, acting as an intermediate step before > > further processing. Kafka focuses relentlessly on performance and > throughput > > by not introspecting into message content, nor indexing them on the > broker. > > We also achieve high performance by depending on Java's > sendFile/transferTo > > capabilities to minimize intermediate buffer copies and relying on the > OS's > > pagecache to efficiently serve up message contents to consumers. Kafka is > > also designed to be scalable and it depends on Apache ZooKeeper for > > coordination amongst its producers, brokers and consumers. > > > > Kafka is written in Scala. It was developed internally at LinkedIn to > meet > > our particular use cases, but will be useful to many organizations facing > a > > similar need to reliably process large amounts of streaming data. > > Therefore, we would like to share it the ASF and begin developing a > > community of developers and users within Apache. > > > > == Rationale == > > Many organizations can benefit from a reliable stream processing system > such > > as Kafka. While our use case of processing events from a very large > website > > like LinkedIn has driven the design of Kafka, its uses are varied and we > > expect many new use cases to emerge. Kafka provides a natural bridge > > between near real-time event processing and offline batch processing and > > will appeal to many users. > > > > == Current Status == > > === Meritocracy === > > Our intent with this incubator proposal is to start building a diverse > > developer community around Kafka following the Apache meritocracy model. > > Since Kafka was open sourced we have solicited contributions via the > website > > and presentations given to user groups and technical audiences. We have > had > > positive responses to these and have received several contributions and > > clients for other languages. We plan to continue this support for new > > contributors and work with those who contribute significantly to the > project > > to make them committers. > > > > === Community === > > Kafka is currently being used by developed by engineers within LinkedIn > and > > used in production in that company. Additionally, we have active users in > or > > have received contributions from a diverse set of companies including > > MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public > > presentations of Kafka and its goals garnered much interest from > potential > > contributors. We hope to extend our contributor base significantly and > > invite all those who are interested in building high-throughput > distributed > > systems to participate. We have begun receiving contributions from > outside > > of LinkedIn, including clients for several languages including Ruby, PHP, > > Clojure, .NET and Python. > > > > To further this goal, we use GitHub issue tracking and branching > facilities, > > as well as maintaining a public mailing list via Google Groups. > > > > === Core Developers === > > Kafka is currently being developed by four engineers at LinkedIn: Neha > > Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within > > Apache as a Cassandra committer and PMC member. Neha has been an active > > contributor to several projects LinkedIn has open sourced, including > Bobo, > > Sensei and Zoie. Jay has experience with open source software as the > > originator of the Project Voldemort project, as well as being active > within > > the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and > PMC > > and previous Apache ZooKeeper contributor. > > > > === Alignment === > > The ASF is the natural choice to host the Kafka project as its goal of > > encouraging community-driven open-source projects fits with our vision > for > > Kafka. Additionally, many other projects with which we are familiar with > > and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper > > and log4j are hosted by the ASF and we will benefit and provide benefit > by > > close proximity to them. > > > > == Known Risks == > > === Orphaned Products === > > The core developers plan to work full time on the project. There is very > > little risk of Kafka being abandoned as it is a critical part of > LinkedIn's > > internal infrastructure and is in production use. > > > > === Inexperience with Open Source === > > All of the core developers have experience with open source development. > > LinkedIn open sourced Kafka several months ago and has been receiving > > contributions since. Jun is an Apache Cassandra committer and PMC > member. > > Jay and Neha have been involved with several open source projects > released > > by LinkedIn. Jakob has been actively involved with the ASF as a > full-time > > Hadoop committer and PMC member. > > > > === Homogeneous Developers === > > The current core developers are all from LinkedIn. However, we hope to > > establish a developer community that includes contributors from several > > corporations and we actively encouraging new contributors via the mailing > > lists and public presentations of Kafka. > > > > === Reliance on Salaried Developers === > > Currently, the developers are paid to do work on Kafka. However, once the > > project has a community built around it, we expect to get committers, > > developers and community from outside the current core developers. > However, > > because LinkedIn relies on Kafka internally, the reliance on salaried > > developers is unlikely to change. > > > > === Relationships with Other Apache Products === > > Kafka is deeply integrated with Apache products. Kafka uses Apache > ZooKeeper > > to coordinate its state amongst the brokers, consumers, and soon, the > > producers. Kafka provides input formats to allow Hadoop MapReduce to > load > > data directly from Kafka. Kafka provides an appender to allow consuming > > data directly from Apache log4j. > > > > === An Excessive Fascination with the Apache Brand === > > While we respect the reputation of the Apache brand and have no doubts > that > > it will attract contributors and users, our interest is primarily to give > > Kafka a solid home as an open source project following an established > > development model. We have also given reasons in the Rationale and > Alignment > > sections. > > > > == Documentation == > > Information about Kafka can be found at [http://sna-projects.com/kafka/] > The > > following links provide more information about the project: > > > > * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php] > > * The GitHub site: [https://github.com/kafka-dev/kafka] > > * Kafka overview from Jay Kreps: [ > > http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation] > > * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz] > > * Kafka paper at NetDB 2011: [ > > > http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf > > ] > > > > == Initial Source == > > Kafka has been under development at LinkedIn since November 2009. It was > > open sourced by LinkedIn in January 2011. It is currently hosted on > github > > under the Apache license at [https://github.com/kafka-dev/kafka] > > > > Kafka is mainly written in Scala with some performance testing code in > Java. > > Several clients have been contributed in other languages, including Ruby, > > PHP, Clojure, .NET and Python. Its source tree is entirely self > contained > > and relies of simple build tool (sbt) as its build system and dependency > > resolution mechanism. > > > > == External Dependencies == > > The dependencies all have Apache compatible licenses. > > > > == Cryptography == > > Not applicable. > > > > == Required Resources == > > === Mailing Lists === > > * kafka-private for private PMC discussions (with moderated > subscriptions) > > * kafka-dev > > * kafka-commits > > * kafka-user > > > > === Subversion Directory === > > [https://svn.apache.org/repos/asf/incubator/kafka] > > > > === Issue Tracking === > > JIRA Kafka (KAFKA) > > > > === Other Resources === > > The existing code already has unit tests, so we would like a Hudson > instance > > to run them whenever a new patch is submitted. This can be added after > > project creation. > > > > == Initial Committers == > > * Jay Kreps > > * Jun Rao > > * Neha Narkhede > > * Jakob Homan > > * Phillip Rhodes > > * Henry Saputra > > * Chris Burroughs > > > > == Affiliations == > > * Jay Kreps (LinkedIn) > > * Jun Rao (LinkedIn) > > * Neha Narkhede (LinkedIn) > > * Jakob Homan (LinkedIn) > > * Phillip Rhodes (Fogbeam Labs) > > * Henry Saputra (Cisco Systems) > > * Chris Burroughs (Clearspring Technologies) > > > > == Sponsors == > > === Champion === > > Chris Douglas (Apache Member) > > > > === Nominated Mentors === > > * Alan Cabrera (Apache Member) > > * Geir Magnusson, Jr. (Apache Member and Director) > > * Owen O'Malley (Apache Member) > > > > === Sponsoring Entity === > > We are requesting the Incubator to sponsor this project. > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: chris.a.mattmann@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > -- Joe Andrew Key (Andy) --00504502b2d19f437504a6c9650e--