Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E78A04A80 for ; Wed, 29 Jun 2011 11:22:19 +0000 (UTC) Received: (qmail 84384 invoked by uid 500); 29 Jun 2011 11:22:15 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 83760 invoked by uid 500); 29 Jun 2011 11:22:11 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 83749 invoked by uid 99); 29 Jun 2011 11:22:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jun 2011 11:22:08 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,T_URIBL_SEM,T_URIBL_SEM_RED X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hirsch.dick@gmail.com designates 209.85.216.47 as permitted sender) Received: from [209.85.216.47] (HELO mail-qw0-f47.google.com) (209.85.216.47) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jun 2011 11:22:04 +0000 Received: by qwh5 with SMTP id 5so612143qwh.6 for ; Wed, 29 Jun 2011 04:21:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=lhJLEEdEFy37wZYUzFovg36p+GLZc5cTkOjiXI+x9dg=; b=NSVi5wD7DQwSlguiv3hFelCbvEie2Yrea19Eck9gAiOev+tJtGn3oGDRnhisFRMaZ5 H7zaVq4DGykCa+NaKeEbg3P9w3aTwPfjSzO/uGf8+62jHcRzkx7BUCA4ALGn9WYtNRd8 Xg2DEzR9dLVD+iJsrX/Hs4nUohug4M8RTAwys= MIME-Version: 1.0 Received: by 10.229.237.18 with SMTP id km18mr462402qcb.126.1309346273343; Wed, 29 Jun 2011 04:17:53 -0700 (PDT) Received: by 10.229.44.72 with HTTP; Wed, 29 Jun 2011 04:17:53 -0700 (PDT) In-Reply-To: References: Date: Wed, 29 Jun 2011 13:17:53 +0200 Message-ID: Subject: Re: [VOTE] Kafka to join the Incubator From: Richard Hirsch To: general@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable +1 (Binding) By the way, it is great to see another Scala-based project coming to Apache= . Dick VP Apache ESME On Wed, Jun 29, 2011 at 12:35 PM, Tommaso Teofili wrote: > +1 (binding) > Tommaso > > 2011/6/28 Jun Rao > >> Hi all, >> >> >> Since the discussion on the thread of the Kafka incubator proposal is >> winding down, I'd like to call a vote. >> >> At the end of this mail, I've put a copy of the current proposal. =A0Her= e is >> a link to the document in the wiki: >> http://wiki.apache.org/incubator/KafkaProposal >> >> And here is a link to the discussion thread: >> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html >> >> Please cast your votes: >> >> [ =A0] +1 Accept Kafka for incubation >> [ =A0] +0 Indifferent to Kafka incubation >> [ =A0] =A0-1 Reject Kafka for incubation >> >> This vote will close 72 hours from now. >> >> Thanks, >> >> Jun >> >> =3D=3D Abstract =3D=3D >> Kafka is a distributed publish-subscribe system for processing large >> amounts >> of streaming data. >> >> =3D=3D Proposal =3D=3D >> Kafka provides an extremely high throughput distributed publish/subscrib= e >> messaging system. =A0Additionally, it supports relatively long term >> persistence of messages to support a wide variety of consumers, >> partitioning >> of the message stream across servers and consumers, and functionality fo= r >> loading data into Apache Hadoop for offline, batch processing. >> >> =3D=3D Background =3D=3D >> Kafka was developed at LinkedIn to process the large amounts of events >> generated by that company's website and provide a common repository for >> many >> types of consumers to access and process those events. Kafka has been us= ed >> in production at LinkedIn scale to handle dozens of types of events >> including page views, searches and social network activity. Kafka cluste= rs >> at LinkedIn currently process more than two billion events per day. >> >> Kafka fills the gap between messaging systems such as Apache ActiveMQ, >> which >> provide low latency message delivery but don't focus on throughput, and = log >> processing systems such as Scribe and Flume, which do not provide adequa= te >> latency for our diverse set of consumers. =A0Kafka can also be inserted = into >> traditional log-processing systems, acting as an intermediate step befor= e >> further processing. Kafka focuses relentlessly on performance and >> throughput >> by not introspecting into message content, nor indexing them on the brok= er. >> =A0We also achieve high performance by depending on Java's >> sendFile/transferTo >> capabilities to minimize intermediate buffer copies and relying on the O= S's >> pagecache to efficiently serve up message contents to consumers. Kafka i= s >> also designed to be scalable and it depends on Apache ZooKeeper for >> coordination amongst its producers, brokers and consumers. >> >> Kafka is written in Scala. It was developed internally at LinkedIn to me= et >> our particular use cases, but will be useful to many organizations facin= g a >> similar need to reliably process large amounts of streaming data. >> =A0Therefore, we would like to share it the ASF and begin developing a >> community of developers and users within Apache. >> >> =3D=3D Rationale =3D=3D >> Many organizations can benefit from a reliable stream processing system >> such >> as Kafka. =A0While our use case of processing events from a very large >> website >> like LinkedIn has driven the design of Kafka, its uses are varied and we >> expect many new use cases to emerge. =A0Kafka provides a natural bridge >> between near real-time event processing and offline batch processing and >> will appeal to many users. >> >> =3D=3D Current Status =3D=3D >> =3D=3D=3D Meritocracy =3D=3D=3D >> Our intent with this incubator proposal is to start building a diverse >> developer community around Kafka following the Apache meritocracy model. >> Since Kafka was open sourced we have solicited contributions via the >> website >> and presentations given to user groups and technical audiences. =A0We ha= ve >> had >> positive responses to these and have received several contributions and >> clients for other languages. =A0We plan to continue this support for new >> contributors and work with those who contribute significantly to the >> project >> to make them committers. >> >> =3D=3D=3D Community =3D=3D=3D >> Kafka is currently being used by developed by engineers within LinkedIn = and >> used in production in that company. Additionally, we have active users i= n >> or >> have received contributions from a diverse set of companies including >> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public >> presentations of Kafka and its goals garnered much interest from potenti= al >> contributors. We hope to extend our contributor base significantly and >> invite all those who are interested in building high-throughput distribu= ted >> systems to participate. =A0We have begun receiving contributions from ou= tside >> of LinkedIn, including clients for several languages including Ruby, PHP= , >> Clojure, .NET and Python. >> >> To further this goal, we use GitHub issue tracking and branching >> facilities, >> as well as maintaining a public mailing list via Google Groups. >> >> =3D=3D=3D Core Developers =3D=3D=3D >> Kafka is currently being developed by four engineers at LinkedIn: Neha >> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within >> Apache as a Cassandra committer and PMC member. Neha has been an active >> contributor to several projects LinkedIn has open sourced, including Bob= o, >> Sensei and Zoie. Jay has experience with open source software as the >> originator of the Project Voldemort project, as well as being active wit= hin >> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and = PMC >> and previous Apache ZooKeeper contributor. >> >> =3D=3D=3D Alignment =3D=3D=3D >> The ASF is the natural choice to host the Kafka project as its goal of >> encouraging community-driven open-source projects fits with our vision f= or >> Kafka. =A0Additionally, many other projects with which we are familiar w= ith >> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeepe= r >> and log4j are hosted by the ASF and we will benefit and provide benefit = by >> close proximity to them. >> >> =3D=3D Known Risks =3D=3D >> =3D=3D=3D Orphaned Products =3D=3D=3D >> The core developers plan to work full time on the project. There is very >> little risk of Kafka being abandoned as it is a critical part of LinkedI= n's >> internal infrastructure and is in production use. >> >> =3D=3D=3D Inexperience with Open Source =3D=3D=3D >> All of the core developers have experience with open source development. >> =A0LinkedIn open sourced Kafka several months ago and has been receiving >> contributions since. =A0Jun is an Apache Cassandra committer and PMC mem= ber. >> =A0Jay and Neha have been involved with several open source projects rel= eased >> by LinkedIn. =A0Jakob has been actively involved with the ASF as a full-= time >> Hadoop committer and PMC member. >> >> =3D=3D=3D Homogeneous Developers =3D=3D=3D >> The current core developers are all from LinkedIn. However, we hope to >> establish a developer community that includes contributors from several >> corporations and we actively encouraging new contributors via the mailin= g >> lists and public presentations of Kafka. >> >> =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D >> Currently, the developers are paid to do work on Kafka. However, once th= e >> project has a community built around it, we expect to get committers, >> developers and community from outside the current core developers. Howev= er, >> because LinkedIn relies on Kafka internally, the reliance on salaried >> developers is unlikely to change. >> >> =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D >> Kafka is deeply integrated with Apache products. Kafka uses Apache >> ZooKeeper >> to coordinate its state amongst the brokers, consumers, and soon, the >> producers. =A0Kafka provides input formats to allow Hadoop MapReduce to = load >> data directly from Kafka. =A0Kafka provides an appender to allow consumi= ng >> data directly from Apache log4j. >> >> =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D >> While we respect the reputation of the Apache brand and have no doubts t= hat >> it will attract contributors and users, our interest is primarily to giv= e >> Kafka a solid home as an open source project following an established >> development model. We have also given reasons in the Rationale and >> Alignment >> sections. >> >> =3D=3D Documentation =3D=3D >> Information about Kafka can be found at [http://sna-projects.com/kafka/] >> The >> following links provide more information about the project: >> >> =A0* Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.ph= p] >> =A0* The GitHub site: [https://github.com/kafka-dev/kafka] >> =A0* Kafka overview from Jay Kreps: [ >> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation] >> =A0* Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz] >> =A0* Kafka paper at NetDB 2011: [ >> >> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11pa= pers/netdb11-final12.pdf >> ] >> >> =3D=3D Initial Source =3D=3D >> Kafka has been under development at LinkedIn since November 2009. =A0It = was >> open sourced by LinkedIn in January 2011. =A0It is currently hosted on g= ithub >> under the Apache license at [https://github.com/kafka-dev/kafka] >> >> Kafka is mainly written in Scala with some performance testing code in >> Java. >> =A0Several clients have been contributed in other languages, including R= uby, >> PHP, Clojure, .NET and Python. =A0Its source tree is entirely self conta= ined >> and relies of simple build tool (sbt) as its build system and dependency >> resolution mechanism. >> >> =3D=3D External Dependencies =3D=3D >> The dependencies all have Apache compatible licenses. >> >> =3D=3D Cryptography =3D=3D >> Not applicable. >> >> =3D=3D Required Resources =3D=3D >> =3D=3D=3D Mailing Lists =3D=3D=3D >> =A0* kafka-private for private PMC discussions (with moderated subscript= ions) >> =A0* kafka-dev >> =A0* kafka-commits >> =A0* kafka-user >> >> =3D=3D=3D Subversion Directory =3D=3D=3D >> [https://svn.apache.org/repos/asf/incubator/kafka] >> >> =3D=3D=3D Issue Tracking =3D=3D=3D >> JIRA Kafka (KAFKA) >> >> =3D=3D=3D Other Resources =3D=3D=3D >> The existing code already has unit tests, so we would like a Hudson >> instance >> to run them whenever a new patch is submitted. This can be added after >> project creation. >> >> =3D=3D Initial Committers =3D=3D >> =A0* Jay Kreps >> =A0* Jun Rao >> =A0* Neha Narkhede >> =A0* Jakob Homan >> =A0* Phillip Rhodes >> =A0* Henry Saputra >> =A0* Chris Burroughs >> >> =3D=3D Affiliations =3D=3D >> =A0* Jay Kreps (LinkedIn) >> =A0* Jun Rao (LinkedIn) >> =A0* Neha Narkhede (LinkedIn) >> =A0* Jakob Homan (LinkedIn) >> =A0* Phillip Rhodes (Fogbeam Labs) >> =A0* Henry Saputra (Cisco Systems) >> =A0* Chris Burroughs (Clearspring Technologies) >> >> =3D=3D Sponsors =3D=3D >> =3D=3D=3D Champion =3D=3D=3D >> Chris Douglas (Apache Member) >> >> =3D=3D=3D Nominated Mentors =3D=3D=3D >> =A0* Alan Cabrera (Apache Member) >> =A0* Geir Magnusson, Jr. (Apache Member and Director) >> =A0* Owen O'Malley (Apache Member) >> >> =3D=3D=3D Sponsoring Entity =3D=3D=3D >> We are requesting the Incubator to sponsor this project. >> > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org