incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Key <joeandrew...@gmail.com>
Subject Re: [VOTE] Kafka to join the Incubator
Date Tue, 28 Jun 2011 17:57:10 GMT
+1

Sincerely,
J. Andrew Key (Andy)

On Tue, Jun 28, 2011 at 10:32 AM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> +1 (binding).
>
> Thanks!
>
> Cheers,
> Chris
>
> On Jun 28, 2011, at 10:00 AM, Jun Rao wrote:
>
> > Hi all,
> >
> >
> > Since the discussion on the thread of the Kafka incubator proposal is
> > winding down, I'd like to call a vote.
> >
> > At the end of this mail, I've put a copy of the current proposal.  Here
> is
> > a link to the document in the wiki:
> > http://wiki.apache.org/incubator/KafkaProposal
> >
> > And here is a link to the discussion thread:
> > http://www.mail-archive.com/general@incubator.apache.org/msg29594.html
> >
> > Please cast your votes:
> >
> > [  ] +1 Accept Kafka for incubation
> > [  ] +0 Indifferent to Kafka incubation
> > [  ]  -1 Reject Kafka for incubation
> >
> > This vote will close 72 hours from now.
> >
> > Thanks,
> >
> > Jun
> >
> > == Abstract ==
> > Kafka is a distributed publish-subscribe system for processing large
> amounts
> > of streaming data.
> >
> > == Proposal ==
> > Kafka provides an extremely high throughput distributed publish/subscribe
> > messaging system.  Additionally, it supports relatively long term
> > persistence of messages to support a wide variety of consumers,
> partitioning
> > of the message stream across servers and consumers, and functionality for
> > loading data into Apache Hadoop for offline, batch processing.
> >
> > == Background ==
> > Kafka was developed at LinkedIn to process the large amounts of events
> > generated by that company's website and provide a common repository for
> many
> > types of consumers to access and process those events. Kafka has been
> used
> > in production at LinkedIn scale to handle dozens of types of events
> > including page views, searches and social network activity. Kafka
> clusters
> > at LinkedIn currently process more than two billion events per day.
> >
> > Kafka fills the gap between messaging systems such as Apache ActiveMQ,
> which
> > provide low latency message delivery but don't focus on throughput, and
> log
> > processing systems such as Scribe and Flume, which do not provide
> adequate
> > latency for our diverse set of consumers.  Kafka can also be inserted
> into
> > traditional log-processing systems, acting as an intermediate step before
> > further processing. Kafka focuses relentlessly on performance and
> throughput
> > by not introspecting into message content, nor indexing them on the
> broker.
> > We also achieve high performance by depending on Java's
> sendFile/transferTo
> > capabilities to minimize intermediate buffer copies and relying on the
> OS's
> > pagecache to efficiently serve up message contents to consumers. Kafka is
> > also designed to be scalable and it depends on Apache ZooKeeper for
> > coordination amongst its producers, brokers and consumers.
> >
> > Kafka is written in Scala. It was developed internally at LinkedIn to
> meet
> > our particular use cases, but will be useful to many organizations facing
> a
> > similar need to reliably process large amounts of streaming data.
> > Therefore, we would like to share it the ASF and begin developing a
> > community of developers and users within Apache.
> >
> > == Rationale ==
> > Many organizations can benefit from a reliable stream processing system
> such
> > as Kafka.  While our use case of processing events from a very large
> website
> > like LinkedIn has driven the design of Kafka, its uses are varied and we
> > expect many new use cases to emerge.  Kafka provides a natural bridge
> > between near real-time event processing and offline batch processing and
> > will appeal to many users.
> >
> > == Current Status ==
> > === Meritocracy ===
> > Our intent with this incubator proposal is to start building a diverse
> > developer community around Kafka following the Apache meritocracy model.
> > Since Kafka was open sourced we have solicited contributions via the
> website
> > and presentations given to user groups and technical audiences.  We have
> had
> > positive responses to these and have received several contributions and
> > clients for other languages.  We plan to continue this support for new
> > contributors and work with those who contribute significantly to the
> project
> > to make them committers.
> >
> > === Community ===
> > Kafka is currently being used by developed by engineers within LinkedIn
> and
> > used in production in that company. Additionally, we have active users in
> or
> > have received contributions from a diverse set of companies including
> > MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public
> > presentations of Kafka and its goals garnered much interest from
> potential
> > contributors. We hope to extend our contributor base significantly and
> > invite all those who are interested in building high-throughput
> distributed
> > systems to participate.  We have begun receiving contributions from
> outside
> > of LinkedIn, including clients for several languages including Ruby, PHP,
> > Clojure, .NET and Python.
> >
> > To further this goal, we use GitHub issue tracking and branching
> facilities,
> > as well as maintaining a public mailing list via Google Groups.
> >
> > === Core Developers ===
> > Kafka is currently being developed by four engineers at LinkedIn: Neha
> > Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within
> > Apache as a Cassandra committer and PMC member. Neha has been an active
> > contributor to several projects LinkedIn has open sourced, including
> Bobo,
> > Sensei and Zoie. Jay has experience with open source software as the
> > originator of the Project Voldemort project, as well as being active
> within
> > the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and
> PMC
> > and previous Apache ZooKeeper contributor.
> >
> > === Alignment ===
> > The ASF is the natural choice to host the Kafka project as its goal of
> > encouraging community-driven open-source projects fits with our vision
> for
> > Kafka.  Additionally, many other projects with which we are familiar with
> > and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper
> > and log4j are hosted by the ASF and we will benefit and provide benefit
> by
> > close proximity to them.
> >
> > == Known Risks ==
> > === Orphaned Products ===
> > The core developers plan to work full time on the project. There is very
> > little risk of Kafka being abandoned as it is a critical part of
> LinkedIn's
> > internal infrastructure and is in production use.
> >
> > === Inexperience with Open Source ===
> > All of the core developers have experience with open source development.
> > LinkedIn open sourced Kafka several months ago and has been receiving
> > contributions since.  Jun is an Apache Cassandra committer and PMC
> member.
> > Jay and Neha have been involved with several open source projects
> released
> > by LinkedIn.  Jakob has been actively involved with the ASF as a
> full-time
> > Hadoop committer and PMC member.
> >
> > === Homogeneous Developers ===
> > The current core developers are all from LinkedIn. However, we hope to
> > establish a developer community that includes contributors from several
> > corporations and we actively encouraging new contributors via the mailing
> > lists and public presentations of Kafka.
> >
> > === Reliance on Salaried Developers ===
> > Currently, the developers are paid to do work on Kafka. However, once the
> > project has a community built around it, we expect to get committers,
> > developers and community from outside the current core developers.
> However,
> > because LinkedIn relies on Kafka internally, the reliance on salaried
> > developers is unlikely to change.
> >
> > === Relationships with Other Apache Products ===
> > Kafka is deeply integrated with Apache products. Kafka uses Apache
> ZooKeeper
> > to coordinate its state amongst the brokers, consumers, and soon, the
> > producers.  Kafka provides input formats to allow Hadoop MapReduce to
> load
> > data directly from Kafka.  Kafka provides an appender to allow consuming
> > data directly from Apache log4j.
> >
> > === An Excessive Fascination with the Apache Brand ===
> > While we respect the reputation of the Apache brand and have no doubts
> that
> > it will attract contributors and users, our interest is primarily to give
> > Kafka a solid home as an open source project following an established
> > development model. We have also given reasons in the Rationale and
> Alignment
> > sections.
> >
> > == Documentation ==
> > Information about Kafka can be found at [http://sna-projects.com/kafka/]
> The
> > following links provide more information about the project:
> >
> > * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php]
> > * The GitHub site: [https://github.com/kafka-dev/kafka]
> > * Kafka overview from Jay Kreps: [
> > http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation]
> > * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz]
> > * Kafka paper at NetDB 2011: [
> >
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> > ]
> >
> > == Initial Source ==
> > Kafka has been under development at LinkedIn since November 2009.  It was
> > open sourced by LinkedIn in January 2011.  It is currently hosted on
> github
> > under the Apache license at [https://github.com/kafka-dev/kafka]
> >
> > Kafka is mainly written in Scala with some performance testing code in
> Java.
> > Several clients have been contributed in other languages, including Ruby,
> > PHP, Clojure, .NET and Python.  Its source tree is entirely self
> contained
> > and relies of simple build tool (sbt) as its build system and dependency
> > resolution mechanism.
> >
> > == External Dependencies ==
> > The dependencies all have Apache compatible licenses.
> >
> > == Cryptography ==
> > Not applicable.
> >
> > == Required Resources ==
> > === Mailing Lists ===
> > * kafka-private for private PMC discussions (with moderated
> subscriptions)
> > * kafka-dev
> > * kafka-commits
> > * kafka-user
> >
> > === Subversion Directory ===
> > [https://svn.apache.org/repos/asf/incubator/kafka]
> >
> > === Issue Tracking ===
> > JIRA Kafka (KAFKA)
> >
> > === Other Resources ===
> > The existing code already has unit tests, so we would like a Hudson
> instance
> > to run them whenever a new patch is submitted. This can be added after
> > project creation.
> >
> > == Initial Committers ==
> > * Jay Kreps
> > * Jun Rao
> > * Neha Narkhede
> > * Jakob Homan
> > * Phillip Rhodes
> > * Henry Saputra
> > * Chris Burroughs
> >
> > == Affiliations ==
> > * Jay Kreps (LinkedIn)
> > * Jun Rao (LinkedIn)
> > * Neha Narkhede (LinkedIn)
> > * Jakob Homan (LinkedIn)
> > * Phillip Rhodes (Fogbeam Labs)
> > * Henry Saputra (Cisco Systems)
> > * Chris Burroughs (Clearspring Technologies)
> >
> > == Sponsors ==
> > === Champion ===
> > Chris Douglas (Apache Member)
> >
> > === Nominated Mentors ===
> > * Alan Cabrera (Apache Member)
> > * Geir Magnusson, Jr. (Apache Member and Director)
> > * Owen O'Malley (Apache Member)
> >
> > === Sponsoring Entity ===
> > We are requesting the Incubator to sponsor this project.
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>


-- 
Joe Andrew Key (Andy)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message