incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject Re: [VOTE] Kafka to join the Incubator
Date Tue, 28 Jun 2011 18:01:52 GMT
+1

- Henry

On Tue, Jun 28, 2011 at 10:57 AM, Joe Key <joeandrewkey@gmail.com> wrote:
> +1
>
> Sincerely,
> J. Andrew Key (Andy)
>
> On Tue, Jun 28, 2011 at 10:32 AM, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> +1 (binding).
>>
>> Thanks!
>>
>> Cheers,
>> Chris
>>
>> On Jun 28, 2011, at 10:00 AM, Jun Rao wrote:
>>
>> > Hi all,
>> >
>> >
>> > Since the discussion on the thread of the Kafka incubator proposal is
>> > winding down, I'd like to call a vote.
>> >
>> > At the end of this mail, I've put a copy of the current proposal.  Here
>> is
>> > a link to the document in the wiki:
>> > http://wiki.apache.org/incubator/KafkaProposal
>> >
>> > And here is a link to the discussion thread:
>> > http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
>> >
>> > Please cast your votes:
>> >
>> > [  ] +1 Accept Kafka for incubation
>> > [  ] +0 Indifferent to Kafka incubation
>> > [  ]  -1 Reject Kafka for incubation
>> >
>> > This vote will close 72 hours from now.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > == Abstract ==
>> > Kafka is a distributed publish-subscribe system for processing large
>> amounts
>> > of streaming data.
>> >
>> > == Proposal ==
>> > Kafka provides an extremely high throughput distributed publish/subscribe
>> > messaging system.  Additionally, it supports relatively long term
>> > persistence of messages to support a wide variety of consumers,
>> partitioning
>> > of the message stream across servers and consumers, and functionality for
>> > loading data into Apache Hadoop for offline, batch processing.
>> >
>> > == Background ==
>> > Kafka was developed at LinkedIn to process the large amounts of events
>> > generated by that company's website and provide a common repository for
>> many
>> > types of consumers to access and process those events. Kafka has been
>> used
>> > in production at LinkedIn scale to handle dozens of types of events
>> > including page views, searches and social network activity. Kafka
>> clusters
>> > at LinkedIn currently process more than two billion events per day.
>> >
>> > Kafka fills the gap between messaging systems such as Apache ActiveMQ,
>> which
>> > provide low latency message delivery but don't focus on throughput, and
>> log
>> > processing systems such as Scribe and Flume, which do not provide
>> adequate
>> > latency for our diverse set of consumers.  Kafka can also be inserted
>> into
>> > traditional log-processing systems, acting as an intermediate step before
>> > further processing. Kafka focuses relentlessly on performance and
>> throughput
>> > by not introspecting into message content, nor indexing them on the
>> broker.
>> > We also achieve high performance by depending on Java's
>> sendFile/transferTo
>> > capabilities to minimize intermediate buffer copies and relying on the
>> OS's
>> > pagecache to efficiently serve up message contents to consumers. Kafka is
>> > also designed to be scalable and it depends on Apache ZooKeeper for
>> > coordination amongst its producers, brokers and consumers.
>> >
>> > Kafka is written in Scala. It was developed internally at LinkedIn to
>> meet
>> > our particular use cases, but will be useful to many organizations facing
>> a
>> > similar need to reliably process large amounts of streaming data.
>> > Therefore, we would like to share it the ASF and begin developing a
>> > community of developers and users within Apache.
>> >
>> > == Rationale ==
>> > Many organizations can benefit from a reliable stream processing system
>> such
>> > as Kafka.  While our use case of processing events from a very large
>> website
>> > like LinkedIn has driven the design of Kafka, its uses are varied and we
>> > expect many new use cases to emerge.  Kafka provides a natural bridge
>> > between near real-time event processing and offline batch processing and
>> > will appeal to many users.
>> >
>> > == Current Status ==
>> > === Meritocracy ===
>> > Our intent with this incubator proposal is to start building a diverse
>> > developer community around Kafka following the Apache meritocracy model.
>> > Since Kafka was open sourced we have solicited contributions via the
>> website
>> > and presentations given to user groups and technical audiences.  We have
>> had
>> > positive responses to these and have received several contributions and
>> > clients for other languages.  We plan to continue this support for new
>> > contributors and work with those who contribute significantly to the
>> project
>> > to make them committers.
>> >
>> > === Community ===
>> > Kafka is currently being used by developed by engineers within LinkedIn
>> and
>> > used in production in that company. Additionally, we have active users in
>> or
>> > have received contributions from a diverse set of companies including
>> > MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
>> > presentations of Kafka and its goals garnered much interest from
>> potential
>> > contributors. We hope to extend our contributor base significantly and
>> > invite all those who are interested in building high-throughput
>> distributed
>> > systems to participate.  We have begun receiving contributions from
>> outside
>> > of LinkedIn, including clients for several languages including Ruby, PHP,
>> > Clojure, .NET and Python.
>> >
>> > To further this goal, we use GitHub issue tracking and branching
>> facilities,
>> > as well as maintaining a public mailing list via Google Groups.
>> >
>> > === Core Developers ===
>> > Kafka is currently being developed by four engineers at LinkedIn: Neha
>> > Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
>> > Apache as a Cassandra committer and PMC member. Neha has been an active
>> > contributor to several projects LinkedIn has open sourced, including
>> Bobo,
>> > Sensei and Zoie. Jay has experience with open source software as the
>> > originator of the Project Voldemort project, as well as being active
>> within
>> > the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and
>> PMC
>> > and previous Apache ZooKeeper contributor.
>> >
>> > === Alignment ===
>> > The ASF is the natural choice to host the Kafka project as its goal of
>> > encouraging community-driven open-source projects fits with our vision
>> for
>> > Kafka.  Additionally, many other projects with which we are familiar with
>> > and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
>> > and log4j are hosted by the ASF and we will benefit and provide benefit
>> by
>> > close proximity to them.
>> >
>> > == Known Risks ==
>> > === Orphaned Products ===
>> > The core developers plan to work full time on the project. There is very
>> > little risk of Kafka being abandoned as it is a critical part of
>> LinkedIn's
>> > internal infrastructure and is in production use.
>> >
>> > === Inexperience with Open Source ===
>> > All of the core developers have experience with open source development.
>> > LinkedIn open sourced Kafka several months ago and has been receiving
>> > contributions since.  Jun is an Apache Cassandra committer and PMC
>> member.
>> > Jay and Neha have been involved with several open source projects
>> released
>> > by LinkedIn.  Jakob has been actively involved with the ASF as a
>> full-time
>> > Hadoop committer and PMC member.
>> >
>> > === Homogeneous Developers ===
>> > The current core developers are all from LinkedIn. However, we hope to
>> > establish a developer community that includes contributors from several
>> > corporations and we actively encouraging new contributors via the mailing
>> > lists and public presentations of Kafka.
>> >
>> > === Reliance on Salaried Developers ===
>> > Currently, the developers are paid to do work on Kafka. However, once the
>> > project has a community built around it, we expect to get committers,
>> > developers and community from outside the current core developers.
>> However,
>> > because LinkedIn relies on Kafka internally, the reliance on salaried
>> > developers is unlikely to change.
>> >
>> > === Relationships with Other Apache Products ===
>> > Kafka is deeply integrated with Apache products. Kafka uses Apache
>> ZooKeeper
>> > to coordinate its state amongst the brokers, consumers, and soon, the
>> > producers.  Kafka provides input formats to allow Hadoop MapReduce to
>> load
>> > data directly from Kafka.  Kafka provides an appender to allow consuming
>> > data directly from Apache log4j.
>> >
>> > === An Excessive Fascination with the Apache Brand ===
>> > While we respect the reputation of the Apache brand and have no doubts
>> that
>> > it will attract contributors and users, our interest is primarily to give
>> > Kafka a solid home as an open source project following an established
>> > development model. We have also given reasons in the Rationale and
>> Alignment
>> > sections.
>> >
>> > == Documentation ==
>> > Information about Kafka can be found at [http://sna-projects.com/kafka/]
>> The
>> > following links provide more information about the project:
>> >
>> > * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
>> > * The GitHub site: [https://github.com/kafka-dev/kafka]
>> > * Kafka overview from Jay Kreps: [
>> > http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
>> > * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
>> > * Kafka paper at NetDB 2011: [
>> >
>> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
>> > ]
>> >
>> > == Initial Source ==
>> > Kafka has been under development at LinkedIn since November 2009.  It was
>> > open sourced by LinkedIn in January 2011.  It is currently hosted on
>> github
>> > under the Apache license at [https://github.com/kafka-dev/kafka]
>> >
>> > Kafka is mainly written in Scala with some performance testing code in
>> Java.
>> > Several clients have been contributed in other languages, including Ruby,
>> > PHP, Clojure, .NET and Python.  Its source tree is entirely self
>> contained
>> > and relies of simple build tool (sbt) as its build system and dependency
>> > resolution mechanism.
>> >
>> > == External Dependencies ==
>> > The dependencies all have Apache compatible licenses.
>> >
>> > == Cryptography ==
>> > Not applicable.
>> >
>> > == Required Resources ==
>> > === Mailing Lists ===
>> > * kafka-private for private PMC discussions (with moderated
>> subscriptions)
>> > * kafka-dev
>> > * kafka-commits
>> > * kafka-user
>> >
>> > === Subversion Directory ===
>> > [https://svn.apache.org/repos/asf/incubator/kafka]
>> >
>> > === Issue Tracking ===
>> > JIRA Kafka (KAFKA)
>> >
>> > === Other Resources ===
>> > The existing code already has unit tests, so we would like a Hudson
>> instance
>> > to run them whenever a new patch is submitted. This can be added after
>> > project creation.
>> >
>> > == Initial Committers ==
>> > * Jay Kreps
>> > * Jun Rao
>> > * Neha Narkhede
>> > * Jakob Homan
>> > * Phillip Rhodes
>> > * Henry Saputra
>> > * Chris Burroughs
>> >
>> > == Affiliations ==
>> > * Jay Kreps (LinkedIn)
>> > * Jun Rao (LinkedIn)
>> > * Neha Narkhede (LinkedIn)
>> > * Jakob Homan (LinkedIn)
>> > * Phillip Rhodes (Fogbeam Labs)
>> > * Henry Saputra (Cisco Systems)
>> > * Chris Burroughs (Clearspring Technologies)
>> >
>> > == Sponsors ==
>> > === Champion ===
>> > Chris Douglas (Apache Member)
>> >
>> > === Nominated Mentors ===
>> > * Alan Cabrera (Apache Member)
>> > * Geir Magnusson, Jr. (Apache Member and Director)
>> > * Owen O'Malley (Apache Member)
>> >
>> > === Sponsoring Entity ===
>> > We are requesting the Incubator to sponsor this project.
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>
>
> --
> Joe Andrew Key (Andy)
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message