incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Marz <nathan.m...@gmail.com>
Subject Re: [PROPOSAL] Storm for Apache Incubator
Date Thu, 05 Sep 2013 02:23:51 GMT
I think that storm-kafka would make sense as a contrib module since it's widely used. I'm not
sure what to do with the other storm-contrib modules. I figure the less code that's part of
the initial repo the better, because there will be less contribution/legal issues to sort
out. How about this - we plan to include storm-kafka under a contrib folder of the Apache
Storm project (just because a lot of people depend on it), and we can pull other storm-contrib
modules in if community members show initiative in working on and maintaining them?

If that all sounds good I'll update the proposal accordingly.


On Sep 4, 2013, at 6:41 PM, Joe Stein <cryptcom@gmail.com> wrote:

> What does this mean for storm contribs (
> https://github.com/nathanmarz/storm-contrib)? (spouts & bolts) e.g The
> Apache Kafka spout already it is hard to know which to use and which is
> best for 0.7.X and 0.8.X-betaX...  Is the Apache Storm project going to
> help corral that or is it only for Storm core as the proposal implies with
> only the storm code base https://github.com/nathanmarz/storm being part of
> the project?
> 
> A lot of traffic on the existing user list is about spouts (e.g. the Kafka
> Spout) and I was not sure if that would still be talked about or funneled
> somewhere else or what the thoughts/plans where for the parts built within
> Storm that are existing now?
> 
> /*******************************************
> Joe Stein
> Founder, Principal Consultant
> Big Data Open Source Security LLC
> http://www.stealth.ly
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
> 
> 
> On Wed, Sep 4, 2013 at 4:34 PM, Nathan Marz <nathan@nathanmarz.com> wrote:
> 
>> We definitely need a storm-user list as the existing google groups mailing
>> list for Storm is quite active. So we'll need to transition that over. I
>> agree on adding a storm-commits list and added it to the proposal.
>> 
>> 
>> On Wed, Sep 4, 2013 at 11:50 AM, Henry Saputra <henry.saputra@gmail.com
>>> wrote:
>> 
>>> Excited about Storm coming to Apache. Small comment about the mailing
>> list,
>>> you may want to propose having:
>>> * storm-dev
>>> * storm-commits
>>> * storm-private (with moderated subscriptions)
>>> 
>>> instead for starting into incubator.
>>> 
>>> However, Storm has been a well known open source project, maybe it does
>>> valid to have storm-user from the beginning. But I think you may need
>>> storm-commits
>>> list to separate commits log from dev discussions.
>>> Mentors can chime in about this.
>>> 
>>> Thanks,
>>> 
>>> Henry
>>> 
>>> 
>>> 
>>> On Wed, Sep 4, 2013 at 1:07 AM, Nathan Marz <nathan@nathanmarz.com>
>> wrote:
>>> 
>>>> Hi everyone,
>>>> 
>>>> I'd like to propose Storm to be an Apache Incubator project. After much
>>>> thought I believe this is the right next step for the project, and I
>> look
>>>> forward to hearing everyone's thoughts and feedback!
>>>> 
>>>> Here's a link to the proposal:
>>>> https://wiki.apache.org/incubator/StormProposal
>>>> 
>>>> The proposal is also pasted below.
>>>> 
>>>> -Nathan
>>>> 
>>>> 
>>>> = Storm Proposal =
>>>> 
>>>> == Abstract ==
>>>> 
>>>> Storm is a distributed, fault-tolerant, and high-performance realtime
>>>> computation system that provides strong guarantees on the processing of
>>>> data.
>>>> 
>>>> == Proposal ==
>>>> 
>>>> Storm is a distributed real-time computation system. Similar to how
>>> Hadoop
>>>> provides a set of general primitives for doing batch processing, Storm
>>>> provides a set of general primitives for doing real-time computation.
>> Its
>>>> use cases span stream processing, distributed RPC, continuous
>>> computation,
>>>> and more. Storm has become a preferred technology for near-realtime
>>>> big-data processing by many organizations worldwide (see a partial list
>>> at
>>>> https://github.com/nathanmarz/storm/wiki/Powered-By). As an open
>> source
>>>> project, Storm’s developer community has grown rapidly to 46 members.
>>>> 
>>>> == Background ==
>>>> 
>>>> The past decade has seen a revolution in data processing. MapReduce,
>>>> Hadoop, and related technologies have made it possible to store and
>>> process
>>>> data at scales previously unthinkable. Unfortunately, these data
>>> processing
>>>> technologies are not realtime systems, nor are they meant to be. The
>> lack
>>>> of a "Hadoop of realtime" has become the biggest hole in the data
>>>> processing ecosystem. Storm fills that hole.
>>>> 
>>>> Storm was initially developed and deployed at BackType in 2011. After 7
>>>> months of development BackType was acquired by Twitter in July 2011.
>>> Storm
>>>> was open sourced in September 2011.
>>>> 
>>>> Storm has been under continuous development on its Github repository
>>> since
>>>> being open-sourced. It has undergone four major releases (0.5, 0.6,
>> 0.7,
>>>> 0.8) and many minor ones.
>>>> 
>>>> == Rationale ==
>>>> 
>>>> Storm is a general platform for low-latency big-data processing. It is
>>>> complementary to the existing Apache projects, such as Hadoop. Many
>>>> applications are actually exploring using both Hadoop and Storm for
>>>> big-data processing. Bringing Storm into Apache is very beneficial to
>>> both
>>>> Apache community and Storm community.
>>>> 
>>>> The rapid growth of Storm community is empowered by open source. We
>>> believe
>>>> the Apache foundation is a great fit as the long-term home for Storm,
>> as
>>> it
>>>> provides an established process for community-driven development and
>>>> decision making by consensus. This is exactly the model we want for
>>> future
>>>> Storm development.
>>>> 
>>>> == Initial Goals ==
>>>> 
>>>>  * Move the existing codebase to Apache
>>>>  * Integrate with the Apache development process
>>>>  * Ensure all dependencies are compliant with Apache License version
>> 2.0
>>>>  * Incremental development and releases per Apache guidelines
>>>> 
>>>> == Current Status ==
>>>> 
>>>> Storm has undergone four major releases (0.5, 0.6, 0.7, 0.8) and many
>>> minor
>>>> ones. Storm 0.9 is about to be released. Storm is being used in
>>> production
>>>> by over 50 organizations. Storm codebase is currently hosted at
>>> github.com
>>>> ,
>>>> which will seed the Apache git repository.
>>>> 
>>>> === Meritocracy ===
>>>> 
>>>> We plan to invest in supporting a meritocracy. We will discuss the
>>>> requirements in an open forum. Several companies have already expressed
>>>> interest in this project, and we intend to invite additional developers
>>> to
>>>> participate. We will encourage and monitor community participation so
>>> that
>>>> privileges can be extended to those that contribute.
>>>> 
>>>> === Community ===
>>>> 
>>>> The need for a low-latency big-data processing platform in the open
>>> source
>>>> is tremendous. Storm is currently being used by at least 50
>> organizations
>>>> worldwide (see https://github.com/nathanmarz/storm/wiki/Powered-By),
>> and
>>>> is
>>>> the most starred Java project on Github. By bringing Storm into Apache,
>>> we
>>>> believe that the community will grow even bigger.
>>>> 
>>>> === Core Developers ===
>>>> 
>>>> Storm was started by Nathan Marz at BackType, and now has developers
>> from
>>>> Yahoo!, Microsoft, Alibaba, Infochimps, and many other companies.
>>>> 
>>>> === Alignment ===
>>>> 
>>>> In the big-data processing ecosystem, Storm is a very popular
>> low-latency
>>>> platform, while Hadoop is the primary platform for batch processing. We
>>>> believe that it will help the further growth of big-data community by
>>>> having Hadoop and Storm aligned within Apache foundation. The alignment
>>> is
>>>> also beneficial to other Apache communities (such as Zookeeper, Thrift,
>>>> Mesos). We could include additional sub-projects, Storm-on-YARN and
>>>> Storm-on-Mesos, in the near future.
>>>> 
>>>> == Known Risks ==
>>>> 
>>>> === Orphaned Products ===
>>>> 
>>>> The risk of the Storm project being abandoned is minimal. There are at
>>>> least 50 organizations (Twitter, Yahoo!, Microsoft, Groupon, Baidu,
>>>> Alibaba, Alipay, Taobao, PARC, RocketFuel etc) are highly incentivized
>> to
>>>> continue development. Many of these organizations have built critical
>>>> business applications upon Storm, and have devoted significant internal
>>>> infrastructure investment in Storm.
>>>> 
>>>> === Inexperience with Open Source ===
>>>> 
>>>> Storm has existed as a healthy open source project for several years.
>>>> During that time, we have curated an open-source community
>> successfully,
>>>> attracting over 40 developers from a diverse group of companies
>> including
>>>> Twitter, Yahoo!, and Alibaba.
>>>> 
>>>> === Homogenous Developers ===
>>>> 
>>>> The initial committers are employed by large companies (including
>>> Twitter,
>>>> Yahoo!, Alibaba, Microsoft) and well-funded startups. Storm has an
>> active
>>>> community of developers, and we are committed to recruiting additional
>>>> committers based on their contributions to the project.
>>>> 
>>>> === Reliance on Salaried Developers ===
>>>> 
>>>> It is expected that Storm development will occur on both salaried time
>>> and
>>>> on volunteer time, after hours. The majority of initial committers are
>>> paid
>>>> by their employer to contribute to this project. However, they are all
>>>> passionate about the project, and we are confident that the project
>> will
>>>> continue even if no salaried developers contribute to the project. We
>> are
>>>> committed to recruiting additional committers including non-salaried
>>>> developers.
>>>> 
>>>> === Relationships with Other Apache Products ===
>>>> 
>>>> As mentioned in the Alignment section, Storm is closely integrated with
>>>> Hadoop,
>>>> Zookeeper, Thrift, YARN and Mesos in a numerous ways. We look forward
>> to
>>>> collaborating with those communities, as well as other Apache
>> communities
>>>> (including Apache S4 which focuses on stateful low-latency processing).
>>>> 
>>>> === An Excessive Fascination with the Apache Brand ===
>>>> 
>>>> Storm is already a healthy and well known open source project. This
>>>> proposal is not for the purpose of generating publicity. Rather, the
>>>> primary benefits to joining Apache are those outlined in the Rationale
>>>> section.
>>>> 
>>>> == Documentation ==
>>>> 
>>>> The reader will find these websites highly relevant:
>>>> 
>>>>  * Storm website: http://storm-project.net
>>>>  * Storm documentation: https://github.com/nathanmarz/storm/wiki
>>>>  * Codebase: https://github.com/nathanmarz/storm
>>>>  * User group: https://groups.google.com/group/storm-user
>>>> 
>>>> == Source and Intellectual Property Submission Plan ==
>>>> 
>>>> The Storm codebase is currently hosted on Github:
>>>> https://github.com/nathanmarz/storm.
>>>> 
>>>> This is the exact codebase that we would migrate to the Apache
>>> foundation.
>>>> 
>>>> The Storm source code is currently licensed under Eclipse Public
>> License
>>>> Version 1.0. Some source code was contributed under a contributor
>>> agreement
>>>> based on the Sun contributor agreement (v1.5). More recent code has
>> been
>>>> contributed under an Apache style agreement (see
>> https://dl.dropboxusercontent.com/u/133901206/storm-apache-style-cla.txt
>>> ).
>>>> 
>>>> Upon entering Apache, Storm will migrate to an Apache License 2.0 with
>>> all
>>>> contributions licensed to the Apache Foundation. In certain cases where
>>>> individuals or organizations hold copyright, we will ensure they grant
>> a
>>>> license to the Apache Foundation. Going forward, all commits will be
>>>> licensed directly to the Apache foundation through our signed
>> Individual
>>>> Contributor License Agreements for all committers on the project.
>>>> 
>>>> Yahoo! is also willing to move Storm-on-YARN code from github to be a
>>>> subproject of Apache Storm project. Storm-on-YARN is currently licensed
>>>> under Apache License 2.0 and receive contribution under Apache style
>> CLA.
>>>> Upon entering Apache, Yahoo! will sign over copyright to Apache
>>> foundation.
>>>> 
>>>> == External Dependencies ==
>>>> 
>>>> To the best of our knowledge, all of Storm dependencies (except
>> 0MQ/JMQ)
>>>> are distributed under Apache compatible licenses. Upon acceptance to
>> the
>>>> incubator, we would begin a thorough analysis of all transitive
>>>> dependencies to verify this fact and introduce license checking into
>> the
>>>> build and release process (for instance integrating Apache Rat).
>>>> 
>>>> Storm has used 0MQ and JMQ as the default mechanism for internal
>>> messaging
>>>> layer, and 0MQ/JMQ is licensed under GNU Lesser General Public License.
>>>> Recently, we have made Storm messaging layer pluggable, and plan to use
>>>> Netty (which is licensed under Apache License v2) as our default
>>> messaging
>>>> plugin (while keep 0MQ as an optional plugin).
>>>> 
>>>> == Cryptography ==
>>>> 
>>>> We do not expect Storm to be a controlled export item due to the use of
>>>> encryption.
>>>> 
>>>> Storm enable encryptions via 2 plugins:
>>>> 
>>>>  * SASL authentication plugins … Currently, we have provide “no-op”
>>>> authentication and digest authentication. In near future, we will
>>> introduce
>>>> Kerberos authentication.
>>>>  * Tuple payload serialization plugins … Storm provides plugins for
>>>> plain-object serialization and blowfish encryption.
>>>> 
>>>> == Required Resources ==
>>>> 
>>>> === Mailing lists ===
>>>> 
>>>> * storm-user
>>>> * storm-dev
>>>> * storm-private (with moderated subscriptions)
>>>> 
>>>> === Subversion Directory ===
>>>> 
>>>> Git is the preferred source control system: git://git.apache.org/storm
>>>> 
>>>> === Issue Tracking ===
>>>> 
>>>> JIRA Storm (STORM)
>>>> 
>>>> == Initial Committers ==
>>>> 
>>>>  * Nathan Marz <nathan at nathanmarz dot com>
>>>>  * James Xu <xumingmingv at gmail dot com>
>>>>  * Jason Jackson <jason at cvk dot ca>
>>>>  * Andy Feng <afeng at yahoo-inc dot com>
>>>>  * Flip Kromer  <flip at infochimps dot com>
>>>>  * David Lao <davidlao at microsoft dot com>
>>>>  * P. Taylor Goetz <ptgoetz at gmail dot com>
>>>> 
>>>> == Affiliations ==
>>>> 
>>>>  * Nathan Marz - Nathan’s Startup
>>>>  * James Xu - Alibaba
>>>>  * Jason Jackson - Twitter
>>>>  * Andy Feng - Yahoo!
>>>>  * Flip Kromer - Infochimps
>>>>  * David Lao - Microsoft
>>>>  * P. Taylor Goetz - Health Market Science
>>>> 
>>>> == Sponsors ==
>>>> 
>>>> === Champion ===
>>>> 
>>>>  * Doug Cutting  <cutting at apache dot org>
>>>> 
>>>> === Nominated Mentors ===
>>>> 
>>>> * Ted Dunning <tdunning at maprtech.com>
>>>> * Arvind Prabhaker <arvind at apache dot org>
>>>> * Devaraj Das <ddas at hortonworks dot com>
>>>> 
>>>> === Sponsoring Entity ===
>>>> 
>>>> The Apache Incubator
>> 
>> 
>> 
>> --
>> Twitter: @nathanmarz
>> http://nathanmarz.com
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message