Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 041146D82 for ; Tue, 28 Jun 2011 18:02:22 +0000 (UTC) Received: (qmail 93354 invoked by uid 500); 28 Jun 2011 18:02:21 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 93020 invoked by uid 500); 28 Jun 2011 18:02:20 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 93010 invoked by uid 99); 28 Jun 2011 18:02:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 18:02:20 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,T_URIBL_SEM,T_URIBL_SEM_RED X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of henry.saputra@gmail.com designates 209.85.210.175 as permitted sender) Received: from [209.85.210.175] (HELO mail-iy0-f175.google.com) (209.85.210.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 18:02:14 +0000 Received: by iym10 with SMTP id 10so406738iym.6 for ; Tue, 28 Jun 2011 11:01:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=FXASa7VLR3iLFbQqhd54f3FdkhqV9KYtYLcp4K+lWkI=; b=W6OqDbpdvRV32q3Phi4mZulcXlotiZWi8PkuY/ZSUNvABrr1L6kIclg1ORqRaURV6z a0ES7bum/9gSkN9X5JoE9ngxonvItyaC+PkTqbeDkOq94a4IsKTbs26khb2Ew/av+8vx qb231eyBYrnYQHihsQuFpBnk3jT0kGe9F67ko= MIME-Version: 1.0 Received: by 10.231.26.87 with SMTP id d23mr7683929ibc.18.1309284112954; Tue, 28 Jun 2011 11:01:52 -0700 (PDT) Received: by 10.231.85.210 with HTTP; Tue, 28 Jun 2011 11:01:52 -0700 (PDT) In-Reply-To: References: Date: Tue, 28 Jun 2011 11:01:52 -0700 Message-ID: Subject: Re: [VOTE] Kafka to join the Incubator From: Henry Saputra To: general@incubator.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org +1 - Henry On Tue, Jun 28, 2011 at 10:57 AM, Joe Key wrote: > +1 > > Sincerely, > J. Andrew Key (Andy) > > On Tue, Jun 28, 2011 at 10:32 AM, Mattmann, Chris A (388J) < > chris.a.mattmann@jpl.nasa.gov> wrote: > >> +1 (binding). >> >> Thanks! >> >> Cheers, >> Chris >> >> On Jun 28, 2011, at 10:00 AM, Jun Rao wrote: >> >> > Hi all, >> > >> > >> > Since the discussion on the thread of the Kafka incubator proposal is >> > winding down, I'd like to call a vote. >> > >> > At the end of this mail, I've put a copy of the current proposal. =C2= =A0Here >> is >> > a link to the document in the wiki: >> > http://wiki.apache.org/incubator/KafkaProposal >> > >> > And here is a link to the discussion thread: >> > http://www.mail-archive.com/general@incubator.apache.org/msg29594.html >> > >> > Please cast your votes: >> > >> > [ =C2=A0] +1 Accept Kafka for incubation >> > [ =C2=A0] +0 Indifferent to Kafka incubation >> > [ =C2=A0] =C2=A0-1 Reject Kafka for incubation >> > >> > This vote will close 72 hours from now. >> > >> > Thanks, >> > >> > Jun >> > >> > =3D=3D Abstract =3D=3D >> > Kafka is a distributed publish-subscribe system for processing large >> amounts >> > of streaming data. >> > >> > =3D=3D Proposal =3D=3D >> > Kafka provides an extremely high throughput distributed publish/subscr= ibe >> > messaging system. =C2=A0Additionally, it supports relatively long term >> > persistence of messages to support a wide variety of consumers, >> partitioning >> > of the message stream across servers and consumers, and functionality = for >> > loading data into Apache Hadoop for offline, batch processing. >> > >> > =3D=3D Background =3D=3D >> > Kafka was developed at LinkedIn to process the large amounts of events >> > generated by that company's website and provide a common repository fo= r >> many >> > types of consumers to access and process those events. Kafka has been >> used >> > in production at LinkedIn scale to handle dozens of types of events >> > including page views, searches and social network activity. Kafka >> clusters >> > at LinkedIn currently process more than two billion events per day. >> > >> > Kafka fills the gap between messaging systems such as Apache ActiveMQ, >> which >> > provide low latency message delivery but don't focus on throughput, an= d >> log >> > processing systems such as Scribe and Flume, which do not provide >> adequate >> > latency for our diverse set of consumers. =C2=A0Kafka can also be inse= rted >> into >> > traditional log-processing systems, acting as an intermediate step bef= ore >> > further processing. Kafka focuses relentlessly on performance and >> throughput >> > by not introspecting into message content, nor indexing them on the >> broker. >> > We also achieve high performance by depending on Java's >> sendFile/transferTo >> > capabilities to minimize intermediate buffer copies and relying on the >> OS's >> > pagecache to efficiently serve up message contents to consumers. Kafka= is >> > also designed to be scalable and it depends on Apache ZooKeeper for >> > coordination amongst its producers, brokers and consumers. >> > >> > Kafka is written in Scala. It was developed internally at LinkedIn to >> meet >> > our particular use cases, but will be useful to many organizations fac= ing >> a >> > similar need to reliably process large amounts of streaming data. >> > Therefore, we would like to share it the ASF and begin developing a >> > community of developers and users within Apache. >> > >> > =3D=3D Rationale =3D=3D >> > Many organizations can benefit from a reliable stream processing syste= m >> such >> > as Kafka. =C2=A0While our use case of processing events from a very la= rge >> website >> > like LinkedIn has driven the design of Kafka, its uses are varied and = we >> > expect many new use cases to emerge. =C2=A0Kafka provides a natural br= idge >> > between near real-time event processing and offline batch processing a= nd >> > will appeal to many users. >> > >> > =3D=3D Current Status =3D=3D >> > =3D=3D=3D Meritocracy =3D=3D=3D >> > Our intent with this incubator proposal is to start building a diverse >> > developer community around Kafka following the Apache meritocracy mode= l. >> > Since Kafka was open sourced we have solicited contributions via the >> website >> > and presentations given to user groups and technical audiences. =C2=A0= We have >> had >> > positive responses to these and have received several contributions an= d >> > clients for other languages. =C2=A0We plan to continue this support fo= r new >> > contributors and work with those who contribute significantly to the >> project >> > to make them committers. >> > >> > =3D=3D=3D Community =3D=3D=3D >> > Kafka is currently being used by developed by engineers within LinkedI= n >> and >> > used in production in that company. Additionally, we have active users= in >> or >> > have received contributions from a diverse set of companies including >> > MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public >> > presentations of Kafka and its goals garnered much interest from >> potential >> > contributors. We hope to extend our contributor base significantly and >> > invite all those who are interested in building high-throughput >> distributed >> > systems to participate. =C2=A0We have begun receiving contributions fr= om >> outside >> > of LinkedIn, including clients for several languages including Ruby, P= HP, >> > Clojure, .NET and Python. >> > >> > To further this goal, we use GitHub issue tracking and branching >> facilities, >> > as well as maintaining a public mailing list via Google Groups. >> > >> > =3D=3D=3D Core Developers =3D=3D=3D >> > Kafka is currently being developed by four engineers at LinkedIn: Neha >> > Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience withi= n >> > Apache as a Cassandra committer and PMC member. Neha has been an activ= e >> > contributor to several projects LinkedIn has open sourced, including >> Bobo, >> > Sensei and Zoie. Jay has experience with open source software as the >> > originator of the Project Voldemort project, as well as being active >> within >> > the Hadoop ecosystem community. Jakob is an Apache Hadoop committer an= d >> PMC >> > and previous Apache ZooKeeper contributor. >> > >> > =3D=3D=3D Alignment =3D=3D=3D >> > The ASF is the natural choice to host the Kafka project as its goal of >> > encouraging community-driven open-source projects fits with our vision >> for >> > Kafka. =C2=A0Additionally, many other projects with which we are famil= iar with >> > and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKee= per >> > and log4j are hosted by the ASF and we will benefit and provide benefi= t >> by >> > close proximity to them. >> > >> > =3D=3D Known Risks =3D=3D >> > =3D=3D=3D Orphaned Products =3D=3D=3D >> > The core developers plan to work full time on the project. There is ve= ry >> > little risk of Kafka being abandoned as it is a critical part of >> LinkedIn's >> > internal infrastructure and is in production use. >> > >> > =3D=3D=3D Inexperience with Open Source =3D=3D=3D >> > All of the core developers have experience with open source developmen= t. >> > LinkedIn open sourced Kafka several months ago and has been receiving >> > contributions since. =C2=A0Jun is an Apache Cassandra committer and PM= C >> member. >> > Jay and Neha have been involved with several open source projects >> released >> > by LinkedIn. =C2=A0Jakob has been actively involved with the ASF as a >> full-time >> > Hadoop committer and PMC member. >> > >> > =3D=3D=3D Homogeneous Developers =3D=3D=3D >> > The current core developers are all from LinkedIn. However, we hope to >> > establish a developer community that includes contributors from severa= l >> > corporations and we actively encouraging new contributors via the mail= ing >> > lists and public presentations of Kafka. >> > >> > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D >> > Currently, the developers are paid to do work on Kafka. However, once = the >> > project has a community built around it, we expect to get committers, >> > developers and community from outside the current core developers. >> However, >> > because LinkedIn relies on Kafka internally, the reliance on salaried >> > developers is unlikely to change. >> > >> > =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D >> > Kafka is deeply integrated with Apache products. Kafka uses Apache >> ZooKeeper >> > to coordinate its state amongst the brokers, consumers, and soon, the >> > producers. =C2=A0Kafka provides input formats to allow Hadoop MapReduc= e to >> load >> > data directly from Kafka. =C2=A0Kafka provides an appender to allow co= nsuming >> > data directly from Apache log4j. >> > >> > =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D >> > While we respect the reputation of the Apache brand and have no doubts >> that >> > it will attract contributors and users, our interest is primarily to g= ive >> > Kafka a solid home as an open source project following an established >> > development model. We have also given reasons in the Rationale and >> Alignment >> > sections. >> > >> > =3D=3D Documentation =3D=3D >> > Information about Kafka can be found at [http://sna-projects.com/kafka= /] >> The >> > following links provide more information about the project: >> > >> > * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php= ] >> > * The GitHub site: [https://github.com/kafka-dev/kafka] >> > * Kafka overview from Jay Kreps: [ >> > http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation] >> > * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz] >> > * Kafka paper at NetDB 2011: [ >> > >> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11pa= pers/netdb11-final12.pdf >> > ] >> > >> > =3D=3D Initial Source =3D=3D >> > Kafka has been under development at LinkedIn since November 2009. =C2= =A0It was >> > open sourced by LinkedIn in January 2011. =C2=A0It is currently hosted= on >> github >> > under the Apache license at [https://github.com/kafka-dev/kafka] >> > >> > Kafka is mainly written in Scala with some performance testing code in >> Java. >> > Several clients have been contributed in other languages, including Ru= by, >> > PHP, Clojure, .NET and Python. =C2=A0Its source tree is entirely self >> contained >> > and relies of simple build tool (sbt) as its build system and dependen= cy >> > resolution mechanism. >> > >> > =3D=3D External Dependencies =3D=3D >> > The dependencies all have Apache compatible licenses. >> > >> > =3D=3D Cryptography =3D=3D >> > Not applicable. >> > >> > =3D=3D Required Resources =3D=3D >> > =3D=3D=3D Mailing Lists =3D=3D=3D >> > * kafka-private for private PMC discussions (with moderated >> subscriptions) >> > * kafka-dev >> > * kafka-commits >> > * kafka-user >> > >> > =3D=3D=3D Subversion Directory =3D=3D=3D >> > [https://svn.apache.org/repos/asf/incubator/kafka] >> > >> > =3D=3D=3D Issue Tracking =3D=3D=3D >> > JIRA Kafka (KAFKA) >> > >> > =3D=3D=3D Other Resources =3D=3D=3D >> > The existing code already has unit tests, so we would like a Hudson >> instance >> > to run them whenever a new patch is submitted. This can be added after >> > project creation. >> > >> > =3D=3D Initial Committers =3D=3D >> > * Jay Kreps >> > * Jun Rao >> > * Neha Narkhede >> > * Jakob Homan >> > * Phillip Rhodes >> > * Henry Saputra >> > * Chris Burroughs >> > >> > =3D=3D Affiliations =3D=3D >> > * Jay Kreps (LinkedIn) >> > * Jun Rao (LinkedIn) >> > * Neha Narkhede (LinkedIn) >> > * Jakob Homan (LinkedIn) >> > * Phillip Rhodes (Fogbeam Labs) >> > * Henry Saputra (Cisco Systems) >> > * Chris Burroughs (Clearspring Technologies) >> > >> > =3D=3D Sponsors =3D=3D >> > =3D=3D=3D Champion =3D=3D=3D >> > Chris Douglas (Apache Member) >> > >> > =3D=3D=3D Nominated Mentors =3D=3D=3D >> > * Alan Cabrera (Apache Member) >> > * Geir Magnusson, Jr. (Apache Member and Director) >> > * Owen O'Malley (Apache Member) >> > >> > =3D=3D=3D Sponsoring Entity =3D=3D=3D >> > We are requesting the Incubator to sponsor this project. >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: chris.a.mattmann@nasa.gov >> WWW: =C2=A0 http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org >> For additional commands, e-mail: general-help@incubator.apache.org >> >> > > > -- > Joe Andrew Key (Andy) > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org