incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan D. Cabrera" <l...@toolazydogs.com>
Subject Re: [PROPOSAL] Storm for Apache Incubator
Date Wed, 04 Sep 2013 17:14:33 GMT
Are we voting?

Regards,
Alan

On Sep 4, 2013, at 8:33 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> +1 binding
> 
> 
> On Wed, Sep 4, 2013 at 7:52 AM, Suresh Srinivas <suresh@hortonworks.com>wrote:
> 
>> +1 (non-binding)
>> 
>> Sent from phone
>> 
>> On Sep 4, 2013, at 1:07 AM, Nathan Marz <nathan@nathanmarz.com> wrote:
>> 
>>> Hi everyone,
>>> 
>>> I'd like to propose Storm to be an Apache Incubator project. After much
>>> thought I believe this is the right next step for the project, and I look
>>> forward to hearing everyone's thoughts and feedback!
>>> 
>>> Here's a link to the proposal:
>>> https://wiki.apache.org/incubator/StormProposal
>>> 
>>> The proposal is also pasted below.
>>> 
>>> -Nathan
>>> 
>>> 
>>> = Storm Proposal =
>>> 
>>> == Abstract ==
>>> 
>>> Storm is a distributed, fault-tolerant, and high-performance realtime
>>> computation system that provides strong guarantees on the processing of
>>> data.
>>> 
>>> == Proposal ==
>>> 
>>> Storm is a distributed real-time computation system. Similar to how
>> Hadoop
>>> provides a set of general primitives for doing batch processing, Storm
>>> provides a set of general primitives for doing real-time computation. Its
>>> use cases span stream processing, distributed RPC, continuous
>> computation,
>>> and more. Storm has become a preferred technology for near-realtime
>>> big-data processing by many organizations worldwide (see a partial list
>> at
>>> https://github.com/nathanmarz/storm/wiki/Powered-By). As an open source
>>> project, Storm’s developer community has grown rapidly to 46 members.
>>> 
>>> == Background ==
>>> 
>>> The past decade has seen a revolution in data processing. MapReduce,
>>> Hadoop, and related technologies have made it possible to store and
>> process
>>> data at scales previously unthinkable. Unfortunately, these data
>> processing
>>> technologies are not realtime systems, nor are they meant to be. The lack
>>> of a "Hadoop of realtime" has become the biggest hole in the data
>>> processing ecosystem. Storm fills that hole.
>>> 
>>> Storm was initially developed and deployed at BackType in 2011. After 7
>>> months of development BackType was acquired by Twitter in July 2011.
>> Storm
>>> was open sourced in September 2011.
>>> 
>>> Storm has been under continuous development on its Github repository
>> since
>>> being open-sourced. It has undergone four major releases (0.5, 0.6, 0.7,
>>> 0.8) and many minor ones.
>>> 
>>> == Rationale ==
>>> 
>>> Storm is a general platform for low-latency big-data processing. It is
>>> complementary to the existing Apache projects, such as Hadoop. Many
>>> applications are actually exploring using both Hadoop and Storm for
>>> big-data processing. Bringing Storm into Apache is very beneficial to
>> both
>>> Apache community and Storm community.
>>> 
>>> The rapid growth of Storm community is empowered by open source. We
>> believe
>>> the Apache foundation is a great fit as the long-term home for Storm, as
>> it
>>> provides an established process for community-driven development and
>>> decision making by consensus. This is exactly the model we want for
>> future
>>> Storm development.
>>> 
>>> == Initial Goals ==
>>> 
>>> * Move the existing codebase to Apache
>>> * Integrate with the Apache development process
>>> * Ensure all dependencies are compliant with Apache License version 2.0
>>> * Incremental development and releases per Apache guidelines
>>> 
>>> == Current Status ==
>>> 
>>> Storm has undergone four major releases (0.5, 0.6, 0.7, 0.8) and many
>> minor
>>> ones. Storm 0.9 is about to be released. Storm is being used in
>> production
>>> by over 50 organizations. Storm codebase is currently hosted at
>> github.com,
>>> which will seed the Apache git repository.
>>> 
>>> === Meritocracy ===
>>> 
>>> We plan to invest in supporting a meritocracy. We will discuss the
>>> requirements in an open forum. Several companies have already expressed
>>> interest in this project, and we intend to invite additional developers
>> to
>>> participate. We will encourage and monitor community participation so
>> that
>>> privileges can be extended to those that contribute.
>>> 
>>> === Community ===
>>> 
>>> The need for a low-latency big-data processing platform in the open
>> source
>>> is tremendous. Storm is currently being used by at least 50 organizations
>>> worldwide (see https://github.com/nathanmarz/storm/wiki/Powered-By),
>> and is
>>> the most starred Java project on Github. By bringing Storm into Apache,
>> we
>>> believe that the community will grow even bigger.
>>> 
>>> === Core Developers ===
>>> 
>>> Storm was started by Nathan Marz at BackType, and now has developers from
>>> Yahoo!, Microsoft, Alibaba, Infochimps, and many other companies.
>>> 
>>> === Alignment ===
>>> 
>>> In the big-data processing ecosystem, Storm is a very popular low-latency
>>> platform, while Hadoop is the primary platform for batch processing. We
>>> believe that it will help the further growth of big-data community by
>>> having Hadoop and Storm aligned within Apache foundation. The alignment
>> is
>>> also beneficial to other Apache communities (such as Zookeeper, Thrift,
>>> Mesos). We could include additional sub-projects, Storm-on-YARN and
>>> Storm-on-Mesos, in the near future.
>>> 
>>> == Known Risks ==
>>> 
>>> === Orphaned Products ===
>>> 
>>> The risk of the Storm project being abandoned is minimal. There are at
>>> least 50 organizations (Twitter, Yahoo!, Microsoft, Groupon, Baidu,
>>> Alibaba, Alipay, Taobao, PARC, RocketFuel etc) are highly incentivized to
>>> continue development. Many of these organizations have built critical
>>> business applications upon Storm, and have devoted significant internal
>>> infrastructure investment in Storm.
>>> 
>>> === Inexperience with Open Source ===
>>> 
>>> Storm has existed as a healthy open source project for several years.
>>> During that time, we have curated an open-source community successfully,
>>> attracting over 40 developers from a diverse group of companies including
>>> Twitter, Yahoo!, and Alibaba.
>>> 
>>> === Homogenous Developers ===
>>> 
>>> The initial committers are employed by large companies (including
>> Twitter,
>>> Yahoo!, Alibaba, Microsoft) and well-funded startups. Storm has an active
>>> community of developers, and we are committed to recruiting additional
>>> committers based on their contributions to the project.
>>> 
>>> === Reliance on Salaried Developers ===
>>> 
>>> It is expected that Storm development will occur on both salaried time
>> and
>>> on volunteer time, after hours. The majority of initial committers are
>> paid
>>> by their employer to contribute to this project. However, they are all
>>> passionate about the project, and we are confident that the project will
>>> continue even if no salaried developers contribute to the project. We are
>>> committed to recruiting additional committers including non-salaried
>>> developers.
>>> 
>>> === Relationships with Other Apache Products ===
>>> 
>>> As mentioned in the Alignment section, Storm is closely integrated with
>>> Hadoop,
>>> Zookeeper, Thrift, YARN and Mesos in a numerous ways. We look forward to
>>> collaborating with those communities, as well as other Apache communities
>>> (including Apache S4 which focuses on stateful low-latency processing).
>>> 
>>> === An Excessive Fascination with the Apache Brand ===
>>> 
>>> Storm is already a healthy and well known open source project. This
>>> proposal is not for the purpose of generating publicity. Rather, the
>>> primary benefits to joining Apache are those outlined in the Rationale
>>> section.
>>> 
>>> == Documentation ==
>>> 
>>> The reader will find these websites highly relevant:
>>> 
>>> * Storm website: http://storm-project.net
>>> * Storm documentation: https://github.com/nathanmarz/storm/wiki
>>> * Codebase: https://github.com/nathanmarz/storm
>>> * User group: https://groups.google.com/group/storm-user
>>> 
>>> == Source and Intellectual Property Submission Plan ==
>>> 
>>> The Storm codebase is currently hosted on Github:
>>> https://github.com/nathanmarz/storm.
>>> 
>>> This is the exact codebase that we would migrate to the Apache
>> foundation.
>>> 
>>> The Storm source code is currently licensed under Eclipse Public License
>>> Version 1.0. Some source code was contributed under a contributor
>> agreement
>>> based on the Sun contributor agreement (v1.5). More recent code has been
>>> contributed under an Apache style agreement (see
>>> https://dl.dropboxusercontent.com/u/133901206/storm-apache-style-cla.txt
>> ).
>>> 
>>> Upon entering Apache, Storm will migrate to an Apache License 2.0 with
>> all
>>> contributions licensed to the Apache Foundation. In certain cases where
>>> individuals or organizations hold copyright, we will ensure they grant a
>>> license to the Apache Foundation. Going forward, all commits will be
>>> licensed directly to the Apache foundation through our signed Individual
>>> Contributor License Agreements for all committers on the project.
>>> 
>>> Yahoo! is also willing to move Storm-on-YARN code from github to be a
>>> subproject of Apache Storm project. Storm-on-YARN is currently licensed
>>> under Apache License 2.0 and receive contribution under Apache style CLA.
>>> Upon entering Apache, Yahoo! will sign over copyright to Apache
>> foundation.
>>> 
>>> == External Dependencies ==
>>> 
>>> To the best of our knowledge, all of Storm dependencies (except 0MQ/JMQ)
>>> are distributed under Apache compatible licenses. Upon acceptance to the
>>> incubator, we would begin a thorough analysis of all transitive
>>> dependencies to verify this fact and introduce license checking into the
>>> build and release process (for instance integrating Apache Rat).
>>> 
>>> Storm has used 0MQ and JMQ as the default mechanism for internal
>> messaging
>>> layer, and 0MQ/JMQ is licensed under GNU Lesser General Public License.
>>> Recently, we have made Storm messaging layer pluggable, and plan to use
>>> Netty (which is licensed under Apache License v2) as our default
>> messaging
>>> plugin (while keep 0MQ as an optional plugin).
>>> 
>>> == Cryptography ==
>>> 
>>> We do not expect Storm to be a controlled export item due to the use of
>>> encryption.
>>> 
>>> Storm enable encryptions via 2 plugins:
>>> 
>>> * SASL authentication plugins … Currently, we have provide “no-op”
>>> authentication and digest authentication. In near future, we will
>> introduce
>>> Kerberos authentication.
>>> * Tuple payload serialization plugins … Storm provides plugins for
>>> plain-object serialization and blowfish encryption.
>>> 
>>> == Required Resources ==
>>> 
>>> === Mailing lists ===
>>> 
>>> * storm-user
>>> * storm-dev
>>> * storm-private (with moderated subscriptions)
>>> 
>>> === Subversion Directory ===
>>> 
>>> Git is the preferred source control system: git://git.apache.org/storm
>>> 
>>> === Issue Tracking ===
>>> 
>>> JIRA Storm (STORM)
>>> 
>>> == Initial Committers ==
>>> 
>>> * Nathan Marz <nathan at nathanmarz dot com>
>>> * James Xu <xumingmingv at gmail dot com>
>>> * Jason Jackson <jason at cvk dot ca>
>>> * Andy Feng <afeng at yahoo-inc dot com>
>>> * Flip Kromer  <flip at infochimps dot com>
>>> * David Lao <davidlao at microsoft dot com>
>>> * P. Taylor Goetz <ptgoetz at gmail dot com>
>>> 
>>> == Affiliations ==
>>> 
>>> * Nathan Marz - Nathan’s Startup
>>> * James Xu - Alibaba
>>> * Jason Jackson - Twitter
>>> * Andy Feng - Yahoo!
>>> * Flip Kromer - Infochimps
>>> * David Lao - Microsoft
>>> * P. Taylor Goetz - Health Market Science
>>> 
>>> == Sponsors ==
>>> 
>>> === Champion ===
>>> 
>>> * Doug Cutting  <cutting at apache dot org>
>>> 
>>> === Nominated Mentors ===
>>> 
>>> * Ted Dunning <tdunning at maprtech.com>
>>> * Arvind Prabhaker <arvind at apache dot org>
>>> * Devaraj Das <ddas at hortonworks dot com>
>>> 
>>> === Sponsoring Entity ===
>>> 
>>> The Apache Incubator
>> 
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message