storm-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bobby Evans <>
Subject Re: [DISCUSS] Pulling "Contrib" Modules into Apache
Date Wed, 26 Feb 2014 17:35:46 GMT
I can see a lot of value in having a distribution of storm that comes with batteries included,
everything is tested together and you know it works.  But I don’t see much long term developer
benefit in building them all together.  If there is strong coupling between storm and these
external projects so that they break when storm changes then we need to understand the coupling
and decide if we want to reduce that coupling by stabilizing APIs, improving version numbering
and release process, etc.; or if the functionality is something that should be offered as
a base service in storm.

I can see politically the value of giving these other projects a home in Apache, and making
them sub-projects is the simplest route to that.  I’d love to have storm on yarn inside
Apache.  I just don’t want to go overboard with it.  There was a time when HBase was a “contrib”
module under Hadoop along with a lot of other things, and the Apache board came and told Hadoop
to brake it up.

Bringing storm-kafka into storm does not sound like it will solve much from a developer’s
perspective, because there is at least as much coupling with kafka as there is with storm.
 I can see how it is a huge amount of overhead and pain to set up a new project just for a
few hundred lines of code, as such I am in favor of pulling in closely related projects, especially
those that are spouts and state implementations. I just want to be sure that we do it carefully,
with a good reason, and with enough people who are familiar with the code to support it long

If it starts to look like we are pulling in too many projects perhaps we should look at something
more like the bigtop project which produces a tested distribution
of Hadoop with many different sub-projects included in it.

I am also a bit concerned about these sub-projects becoming second class citizens, where we
break something, but because the build is off by default we don’t know it.  I would prefer
that they are built and tested by default.  If the build and test time starts to take too
long, to me that means we need to start wondering if we have too many contrib modules.


From: Brian Enochson <<>>
Reply-To: "<>"
Date: Tuesday, February 25, 2014 at 9:50 PM
To: "<>" <<>>
Cc: "<>" <<>>
Subject: Re: [DISCUSS] Pulling "Contrib" Modules into Apache

   I am in agreement with Taylor and believe I understand his intent. An incredible tool/framework/application
like Storm is only enhanced and gains value from the number of well maintained and vetted
modules that can be used for integration and adding further functionality.
  I am relatively new to the Storm community but have spent quite some time reviewing contributing
modules out there, reviewing various duplicates and running into some version incompatibilities.
I understand the need to keep Storm itself pure, but do think there needs to be some structure
and governance added to the contributing modules. Look at the benefit a tool like npm brings
to the node community.
  I like the idea of sponsorship, vetting and a community vote.  I, as sure many would be,
am willing to offer support and time to working through how to set this up and helping with
the implementation if it is decided to pursue some solution.
  I hope these views are taken in the sprit they are made, to make this incredible system
even better along with the surrounding eco-system.


On Tue, Feb 25, 2014 at 9:36 PM, P. Taylor Goetz <<>>
Just to be clear (and play a little Devil’s advocate :) ), I’m not suggesting that whatever
a “contrib” project/module/subproject might  become, be a clearinghouse for anything Storm-related.

I see it as something that is well-vetted by the Storm community, subject to PPMC review,
vote, etc. Entry would require community review, PPMC review, and in some cases ASF IP clearance/legal
review. Anything added would require some level of commitment from the PPMC/committers to
provide some level of support.

In other words, nothing “willy-nilly”.

One option could be that any module added require (X > 0)  number of committers to volunteer
as “sponsor”s for the module, and commit to maintaining it.

That being said, I don’t see storm-kafka being any different from anything else that provides
integration points for Storm.


On Feb 25, 2014, at 7:53 PM, Nathan Marz <<>>

I'm only +1 for pulling in storm-kafka and updating it. Other projects put these contrib modules
in a "contrib" folder and keep them managed as completely separate codebases. As it's not
actually a "module" necessary for Storm, there's an argument there for doing it that way rather
than via the multi-module route.

On Tue, Feb 25, 2014 at 4:39 PM, Milinda Pathirage <<>>
Hi Taylor,

I'm +1 for pulling these external libraries into Apache codebase. This
will certainly benifit Strom community. I also like to contribute to
this process.


On Tue, Feb 25, 2014 at 5:28 PM, P. Taylor Goetz <<>>
> A while back I opened STORM-206 [1] to capture ideas for pulling in
> "contrib" modules to the Apache codebase.
> In the past, we had the storm-contrib github project [2] which subsequently
> got broken up into individual projects hosted on the stormprocessor github
> group [3] and elsewhere.
> The problem with this approach is that in certain cases it led to code rot
> (modules not being updated in step with Storm's API), fragmentation
> (multiple similar modules with the same name), and confusion.
> A good example of this is the storm-kafka module [4], since it is a widely
> used component. Because storm-contrib wasn't being tagged in github, a lot
> of users had trouble reconciling with which versions of storm it was
> compatible. Some users built off specific commit hashes, some forked, and a
> few even pushed custom builds to repositories such as clojars. With kafka
> 0.8 now available, there are two main storm-kafka projects, the original
> (compatible with kafka 0.7) and an updated fork [5] (compatible with kafka
> 0.8).
> My intention is not to find fault in any way, but rather to point out the
> resulting pain, and work toward a better solution.
> I think it would be beneficial to the Storm user community to have certain
> commonly used modules like storm-kafka brought into the Apache Storm
> project. Another benefit worth considering is the licensing/legal oversight
> that the ASF provides, which is important to many users.
> If this is something we want to do, then the big question becomes what sort
> governance process needs to be established to ensure that such things are
> properly maintained.
> Some random thoughts, questions, etc. that jump to mind include:
> What to call these things: "contib modules", "connectors", "integration
> modules", etc.?
> Build integration: I imagine they would be a multi-module submodule of the
> main maven build. Probably turned off by default and enabled by a maven
> profile.
> Governance: Have one or more committer volunteers responsible for
> maintenance, merging patches, etc.? Proposal process for pulling new
> modules?
> I look forward to hearing others' opinions.
> - Taylor
> [1]
> [2]
> [3]
> [4]
> [5]

Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage

Twitter: @nathanmarz<>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message