Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BDF171093F for ; Wed, 4 Sep 2013 18:51:07 +0000 (UTC) Received: (qmail 84464 invoked by uid 500); 4 Sep 2013 18:51:07 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 83981 invoked by uid 500); 4 Sep 2013 18:51:06 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 83973 invoked by uid 99); 4 Sep 2013 18:51:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Sep 2013 18:51:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of henry.saputra@gmail.com designates 209.85.223.175 as permitted sender) Received: from [209.85.223.175] (HELO mail-ie0-f175.google.com) (209.85.223.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Sep 2013 18:51:03 +0000 Received: by mail-ie0-f175.google.com with SMTP id u16so1470934iet.6 for ; Wed, 04 Sep 2013 11:50:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=jzGQtEHNUEbJRVUvJp5ITWNpd2+wCiG/dADgn5JtbhI=; b=xY4fNU+Bt/hVzEJykFK8lEf9y0Mu1g6ARVHe3U8W0njwv5FlZlGWOgI5XpcZC7DGCx p8h8B5yzp9KwOoxkc4NNYtDJ3hPFgj9dMSBim4QXxXfQQGG2R6Xdob/kVgHnGGlbhy9s 7VNQk/Qh3TFpv0CV6C2mKFHBTTU+bALrw34HbXo5EJdqXQ9X0rDK0XXTL6oCUUyIqPJF Ns49Mem+HwUVRQ9cSXquDOkDF+po7USzLoL2pC1S0CZE6SM/Zgolbm+6iWoT6021zknE gaHt4ZySU782zyUKll5oZgBsX95mMXkixg4flLJsmFkgyx0RwtT6ZXrfi7X4OJMd28ws OPEw== MIME-Version: 1.0 X-Received: by 10.42.64.74 with SMTP id f10mr112075ici.86.1378320642467; Wed, 04 Sep 2013 11:50:42 -0700 (PDT) Received: by 10.64.134.165 with HTTP; Wed, 4 Sep 2013 11:50:42 -0700 (PDT) In-Reply-To: References: Date: Wed, 4 Sep 2013 11:50:42 -0700 Message-ID: Subject: Re: [PROPOSAL] Storm for Apache Incubator From: Henry Saputra To: "general@incubator.apache.org" Content-Type: multipart/alternative; boundary=90e6ba61403447ebf204e59348ef X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba61403447ebf204e59348ef Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Excited about Storm coming to Apache. Small comment about the mailing list, you may want to propose having: * storm-dev * storm-commits * storm-private (with moderated subscriptions) instead for starting into incubator. However, Storm has been a well known open source project, maybe it does valid to have storm-user from the beginning. But I think you may need storm-commits list to separate commits log from dev discussions. Mentors can chime in about this. Thanks, Henry On Wed, Sep 4, 2013 at 1:07 AM, Nathan Marz wrote: > Hi everyone, > > I'd like to propose Storm to be an Apache Incubator project. After much > thought I believe this is the right next step for the project, and I look > forward to hearing everyone's thoughts and feedback! > > Here's a link to the proposal: > https://wiki.apache.org/incubator/StormProposal > > The proposal is also pasted below. > > -Nathan > > > =3D Storm Proposal =3D > > =3D=3D Abstract =3D=3D > > Storm is a distributed, fault-tolerant, and high-performance realtime > computation system that provides strong guarantees on the processing of > data. > > =3D=3D Proposal =3D=3D > > Storm is a distributed real-time computation system. Similar to how Hadoo= p > provides a set of general primitives for doing batch processing, Storm > provides a set of general primitives for doing real-time computation. Its > use cases span stream processing, distributed RPC, continuous computation= , > and more. Storm has become a preferred technology for near-realtime > big-data processing by many organizations worldwide (see a partial list a= t > https://github.com/nathanmarz/storm/wiki/Powered-By). As an open source > project, Storm=E2=80=99s developer community has grown rapidly to 46 memb= ers. > > =3D=3D Background =3D=3D > > The past decade has seen a revolution in data processing. MapReduce, > Hadoop, and related technologies have made it possible to store and proce= ss > data at scales previously unthinkable. Unfortunately, these data processi= ng > technologies are not realtime systems, nor are they meant to be. The lack > of a "Hadoop of realtime" has become the biggest hole in the data > processing ecosystem. Storm fills that hole. > > Storm was initially developed and deployed at BackType in 2011. After 7 > months of development BackType was acquired by Twitter in July 2011. Stor= m > was open sourced in September 2011. > > Storm has been under continuous development on its Github repository sinc= e > being open-sourced. It has undergone four major releases (0.5, 0.6, 0.7, > 0.8) and many minor ones. > > =3D=3D Rationale =3D=3D > > Storm is a general platform for low-latency big-data processing. It is > complementary to the existing Apache projects, such as Hadoop. Many > applications are actually exploring using both Hadoop and Storm for > big-data processing. Bringing Storm into Apache is very beneficial to bot= h > Apache community and Storm community. > > The rapid growth of Storm community is empowered by open source. We belie= ve > the Apache foundation is a great fit as the long-term home for Storm, as = it > provides an established process for community-driven development and > decision making by consensus. This is exactly the model we want for futur= e > Storm development. > > =3D=3D Initial Goals =3D=3D > > * Move the existing codebase to Apache > * Integrate with the Apache development process > * Ensure all dependencies are compliant with Apache License version 2.0 > * Incremental development and releases per Apache guidelines > > =3D=3D Current Status =3D=3D > > Storm has undergone four major releases (0.5, 0.6, 0.7, 0.8) and many min= or > ones. Storm 0.9 is about to be released. Storm is being used in productio= n > by over 50 organizations. Storm codebase is currently hosted at github.co= m > , > which will seed the Apache git repository. > > =3D=3D=3D Meritocracy =3D=3D=3D > > We plan to invest in supporting a meritocracy. We will discuss the > requirements in an open forum. Several companies have already expressed > interest in this project, and we intend to invite additional developers t= o > participate. We will encourage and monitor community participation so tha= t > privileges can be extended to those that contribute. > > =3D=3D=3D Community =3D=3D=3D > > The need for a low-latency big-data processing platform in the open sourc= e > is tremendous. Storm is currently being used by at least 50 organizations > worldwide (see https://github.com/nathanmarz/storm/wiki/Powered-By), and > is > the most starred Java project on Github. By bringing Storm into Apache, w= e > believe that the community will grow even bigger. > > =3D=3D=3D Core Developers =3D=3D=3D > > Storm was started by Nathan Marz at BackType, and now has developers from > Yahoo!, Microsoft, Alibaba, Infochimps, and many other companies. > > =3D=3D=3D Alignment =3D=3D=3D > > In the big-data processing ecosystem, Storm is a very popular low-latency > platform, while Hadoop is the primary platform for batch processing. We > believe that it will help the further growth of big-data community by > having Hadoop and Storm aligned within Apache foundation. The alignment i= s > also beneficial to other Apache communities (such as Zookeeper, Thrift, > Mesos). We could include additional sub-projects, Storm-on-YARN and > Storm-on-Mesos, in the near future. > > =3D=3D Known Risks =3D=3D > > =3D=3D=3D Orphaned Products =3D=3D=3D > > The risk of the Storm project being abandoned is minimal. There are at > least 50 organizations (Twitter, Yahoo!, Microsoft, Groupon, Baidu, > Alibaba, Alipay, Taobao, PARC, RocketFuel etc) are highly incentivized to > continue development. Many of these organizations have built critical > business applications upon Storm, and have devoted significant internal > infrastructure investment in Storm. > > =3D=3D=3D Inexperience with Open Source =3D=3D=3D > > Storm has existed as a healthy open source project for several years. > During that time, we have curated an open-source community successfully, > attracting over 40 developers from a diverse group of companies including > Twitter, Yahoo!, and Alibaba. > > =3D=3D=3D Homogenous Developers =3D=3D=3D > > The initial committers are employed by large companies (including Twitter= , > Yahoo!, Alibaba, Microsoft) and well-funded startups. Storm has an active > community of developers, and we are committed to recruiting additional > committers based on their contributions to the project. > > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > > It is expected that Storm development will occur on both salaried time an= d > on volunteer time, after hours. The majority of initial committers are pa= id > by their employer to contribute to this project. However, they are all > passionate about the project, and we are confident that the project will > continue even if no salaried developers contribute to the project. We are > committed to recruiting additional committers including non-salaried > developers. > > =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D > > As mentioned in the Alignment section, Storm is closely integrated with > Hadoop, > Zookeeper, Thrift, YARN and Mesos in a numerous ways. We look forward to > collaborating with those communities, as well as other Apache communities > (including Apache S4 which focuses on stateful low-latency processing). > > =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D > > Storm is already a healthy and well known open source project. This > proposal is not for the purpose of generating publicity. Rather, the > primary benefits to joining Apache are those outlined in the Rationale > section. > > =3D=3D Documentation =3D=3D > > The reader will find these websites highly relevant: > > * Storm website: http://storm-project.net > * Storm documentation: https://github.com/nathanmarz/storm/wiki > * Codebase: https://github.com/nathanmarz/storm > * User group: https://groups.google.com/group/storm-user > > =3D=3D Source and Intellectual Property Submission Plan =3D=3D > > The Storm codebase is currently hosted on Github: > https://github.com/nathanmarz/storm. > > This is the exact codebase that we would migrate to the Apache foundation= . > > The Storm source code is currently licensed under Eclipse Public License > Version 1.0. Some source code was contributed under a contributor agreeme= nt > based on the Sun contributor agreement (v1.5). More recent code has been > contributed under an Apache style agreement (see > https://dl.dropboxusercontent.com/u/133901206/storm-apache-style-cla.txt)= . > > Upon entering Apache, Storm will migrate to an Apache License 2.0 with al= l > contributions licensed to the Apache Foundation. In certain cases where > individuals or organizations hold copyright, we will ensure they grant a > license to the Apache Foundation. Going forward, all commits will be > licensed directly to the Apache foundation through our signed Individual > Contributor License Agreements for all committers on the project. > > Yahoo! is also willing to move Storm-on-YARN code from github to be a > subproject of Apache Storm project. Storm-on-YARN is currently licensed > under Apache License 2.0 and receive contribution under Apache style CLA. > Upon entering Apache, Yahoo! will sign over copyright to Apache foundatio= n. > > =3D=3D External Dependencies =3D=3D > > To the best of our knowledge, all of Storm dependencies (except 0MQ/JMQ) > are distributed under Apache compatible licenses. Upon acceptance to the > incubator, we would begin a thorough analysis of all transitive > dependencies to verify this fact and introduce license checking into the > build and release process (for instance integrating Apache Rat). > > Storm has used 0MQ and JMQ as the default mechanism for internal messagin= g > layer, and 0MQ/JMQ is licensed under GNU Lesser General Public License. > Recently, we have made Storm messaging layer pluggable, and plan to use > Netty (which is licensed under Apache License v2) as our default messagin= g > plugin (while keep 0MQ as an optional plugin). > > =3D=3D Cryptography =3D=3D > > We do not expect Storm to be a controlled export item due to the use of > encryption. > > Storm enable encryptions via 2 plugins: > > * SASL authentication plugins =E2=80=A6 Currently, we have provide =E2= =80=9Cno-op=E2=80=9D > authentication and digest authentication. In near future, we will introdu= ce > Kerberos authentication. > * Tuple payload serialization plugins =E2=80=A6 Storm provides plugins = for > plain-object serialization and blowfish encryption. > > =3D=3D Required Resources =3D=3D > > =3D=3D=3D Mailing lists =3D=3D=3D > > * storm-user > * storm-dev > * storm-private (with moderated subscriptions) > > =3D=3D=3D Subversion Directory =3D=3D=3D > > Git is the preferred source control system: git://git.apache.org/storm > > =3D=3D=3D Issue Tracking =3D=3D=3D > > JIRA Storm (STORM) > > =3D=3D Initial Committers =3D=3D > > * Nathan Marz > * James Xu > * Jason Jackson > * Andy Feng > * Flip Kromer > * David Lao > * P. Taylor Goetz > > =3D=3D Affiliations =3D=3D > > * Nathan Marz - Nathan=E2=80=99s Startup > * James Xu - Alibaba > * Jason Jackson - Twitter > * Andy Feng - Yahoo! > * Flip Kromer - Infochimps > * David Lao - Microsoft > * P. Taylor Goetz - Health Market Science > > =3D=3D Sponsors =3D=3D > > =3D=3D=3D Champion =3D=3D=3D > > * Doug Cutting > > =3D=3D=3D Nominated Mentors =3D=3D=3D > > * Ted Dunning > * Arvind Prabhaker > * Devaraj Das > > =3D=3D=3D Sponsoring Entity =3D=3D=3D > > The Apache Incubator > --90e6ba61403447ebf204e59348ef--