incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: [VOTE] Accept Samza into the Incubator
Date Mon, 29 Jul 2013 22:59:12 GMT
+1 (binding)

Arun

On Jul 26, 2013, at 12:52 PM, Jakob Homan <jghoman@gmail.com> wrote:

> Incubator-
> 
> Following the discussion earlier this week, I'm calling a vote to accept
> Samza as a new Incubator project.
> 
> The proposal draft is available at:
> https://wiki.apache.org/incubator/SamzaProposal,
> and is also included below. It is identical as what was proposed in the
> discussion except for removing the user list, per Marvin's suggestion.
> 
> Vote is open for at least 96h and closes at the earliest on 30 July 13:00
> PDT.  I'm letting the vote run an extra day as we're bookending the weekend
> and I want to give everybody a reasonable workweek margin.
> 
> [ ] +1 accept Samza in the Incubator
> [ ] +/-0
> [ ] -1 because...
> 
> Here's my binding +1
> 
> -Jakob
> 
> ========================================================================================================
> 
> Abstract
> 
> Samza is a stream processing system for running continuous computation on
> infinite streams of data.
> 
> Proposal
> 
> Samza provides a system for processing stream data from publish-subscribe
> systems such as Apache Kafka. The developer writes a stream processing
> task, and executes it as a Samza job. Samza then routes messages between
> stream processing tasks and the publish-subscribe systems that the messages
> are addressed to.
> 
> Background
> 
> Samza was developed at LinkedIn to enable easier processing of streaming
> data on top of Apache Kafka. Current use cases include content processing
> pipelines, aggregating operational log data, data ingestion into
> distributed database infrastructure, and measuring user activity across
> different aggregation types.
> 
> Samza is focused on providing an easy to use framework to process streams.
> It uses Apache YARN to provide a mechanism for deploying stream processing
> tasks in a distributed cluster. Samza also takes advantage of YARN to make
> decisions about stream processor locality, co-partition of streams, and
> provide security. Apache Kafka is also leveraged to provide a mechanism to
> pass messages from one stream processor to the next. Apache Kafka is also
> used to help manage a stream processor's state, so that it can be recovered
> in the event of a failure.
> 
> Samza is written in Scala. It was developed internally at LinkedIn to meet
> our particular use cases, but will be useful to many organizations facing a
> similar need to reliably process large amounts of streaming data.
> Therefore, we would like to share it the ASF and begin developing a
> community of developers and users within Apache.
> 
> Rationale
> 
> Many organizations can benefit from a reliable stream processing system
> such as Samza. While our use case of processing events from a large website
> like LinkedIn has driven the design of Samza, its uses are varied and we
> expect many new use cases to emerge. Samza provides a generic API to
> process messages from streaming infrastructure and will appeal to many
> users.
> 
> Current Status
> 
> Meritocracy
> 
> Our intent with this incubator proposal is to start building a diverse
> developer community around Samza following the Apache meritocracy model.
> Since Samza was initially developed in late 2011, we have had fast adoption
> and contributions by multiple teams at LinkedIn. We plan to continue
> support for new contributors and work with those who contribute
> significantly to the project to make them committers.
> 
> Community
> 
> Samza is currently being used internally at LinkedIn. We hope to extend our
> contributor base significantly and invite all those who are interested in
> building large-scale distributed systems to participate.
> 
> Core Developers
> 
> Samza is currently being developed by four engineers at LinkedIn: Jay
> Kreps, Jakob Homan, Sriram Subramanian, and Chris Riccomini. Jakob is an
> ASF Member, Incubator PMC member and PMC member on Apache Hadoop, Kafka and
> Giraph. Jay is a member of the Apache Kafka PMC and contributor to various
> Apache projects. Chris has been an active contributor for several projects
> including Apache Kafka and Apache YARN. Sriram has contributed to Samza, as
> well as Apache Kafka.
> 
> Alignment
> 
> The ASF is the natural choice to host the Samza project as its goal of
> encouraging community-driven open-source projects fits with our vision for
> Samza. Additionally, many other projects with which we are familiar with
> and expect Samza to integrate with, such as Apache ZooKeeper, YARN, HDFS
> and log4j are hosted by the ASF and we will benefit and provide benefit by
> close proximity to them.
> 
> Known Risks
> 
> Orphaned Products
> 
> The core developers plan to work full time on the project. There is very
> little risk of Samza being abandoned as it is part of LinkedIn's internal
> infrastructure.
> 
> Inexperience with Open Source
> 
> All of the core developers have experience with open source development.
> Jay and Chris has been involved with several open source projects released
> by LinkedIn, and Jay is a committer on Apache Kafka. Jakob has been
> actively involved with the ASF as a full-time Hadoop committer and PMC
> member. Sriram is a contributor to Apache Kafka.
> 
> Homogeneous Developers
> 
> The current core developers are all from LinkedIn. However, we hope to
> establish a developer community that includes contributors from several
> corporations and we actively encouraging new contributors via the mailing
> lists and public presentations of Samza.
> 
> Reliance on Salaried Developers
> 
> Currently, the developers are paid to do work on Samza. However, once the
> project has a community built around it, we expect to get committers,
> developers and community from outside the current core developers. However,
> because LinkedIn relies on Samza internally, the reliance on salaried
> developers is unlikely to change.
> 
> Relationships with Other Apache Products
> 
> Samza is deeply integrated with Apache products. Samza uses Apache Kafka as
> its underlying message passing system. Samza also uses Apache YARN for task
> scheduling. Both YARN and Kafka, in turn, rely on Apache ZooKeeper for
> coordination. In addition, we hope to integrate with Apache HDFS in the
> near future.
> 
> An Excessive Fascination with the Apache Brand
> 
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Samza a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and
> Alignment sections.
> 
> Documentation
> 
> http://wiki.apache.org/incubator/SamzaProposal
> 
> Initial Source
> 
> Available upon request.
> 
> External Dependencies
> 
> The dependencies all have Apache compatible licenses.
> 
> metrics (Apache 2.0)
> zkclient (Apache 2.0)
> zookeeper (Apache 2.0)
> jetty (Apache 2.0)
> jackson (Apache 2.0)
> commons-httpclient (Apache 2.0)
> slf4j (MIT)
> avro (Apache 2.0)
> hadoop (Apache 2.0)
> junit (Common Public License)
> grizzled-slf4j (BSD)
> scalatra (https://github.com/scalatra/scalatra/blob/develop/LICENSE)
> scala (http://www.scala-lang.org/node/146)
> joptsimple (MIT)
> kafka (Apache 2.0)
> scalate (Apache 2.0)
> leveldb jni (BSD)
> Cryptography
> 
> Samza will depend on secure Hadoop, which can optionally use Kerberos.
> 
> Required Resources
> 
> Mailing Lists
> 
> samza-private for private PMC discussions (with moderated subscriptions)
> samza-dev
> samza-commits
> 
> Subversion Directory
> 
> Git is the preferred source control system: git://git.apache.org/samza
> 
> Issue Tracking
> 
> JIRA Samza (SAMZA)
> 
> Other Resources
> 
> The existing code already has unit tests, so we would like a Hudson
> instance to run them whenever a new patch is submitted. This can be added
> after project creation.
> 
> Initial Committers
> 
> Jay Kreps
> Jakob Homan
> Chris Riccomini
> Sriram Subramanian
> Affiliations
> 
> Jay Kreps (LinkedIn)
> Jakob Homan (LinkedIn)
> Chris Riccomini (LinkedIn)
> Sriram Subramanian (LinkedIn)
> Sponsors
> 
> Champion
> 
> Jakob Homan (Apache Member)
> 
> Nominated Mentors
> 
> Arun C Murthy <acmurthy at apache dot org>
> Chris Douglas <cdouglas at apache dot org>
> Roman Shaposhnik <rvs at apache dot org>
> 
> Sponsoring Entity
> 
> We are requesting the Incubator to sponsor this project.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message