storm-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Lee <>
Subject Re: [DISCUSS] Pulling "Contrib" Modules into Apache
Date Wed, 26 Feb 2014 18:43:27 GMT
To build on Bobby's statement, it does pain me as a user to have to search
outside of the project modules to find a compatible build that works with
the latest version of storm as well as the latest module version. However,
in instances such as hbase, cassandra, kafka, etc., I think these commonly
used contrib projects should be pulled into storm if they meet stringent
criteria of:

1) Several volunteer developers familiar with code to update as new
versions arise
2) Fully implemented bolt/spout

" If the build and test time starts to take too long, to me that means we
need to start wondering if we have too many contrib modules." -- +1

I would be willing to volunteer with the cassandra backing map module
(especially with the latest CQL3 release).

On Wed, Feb 26, 2014 at 12:35 PM, Bobby Evans <> wrote:

> I can see a lot of value in having a distribution of storm that comes with
> batteries included, everything is tested together and you know it works.
>  But I don't see much long term developer benefit in building them all
> together.  If there is strong coupling between storm and these external
> projects so that they break when storm changes then we need to understand
> the coupling and decide if we want to reduce that coupling by stabilizing
> APIs, improving version numbering and release process, etc.; or if the
> functionality is something that should be offered as a base service in
> storm.
> I can see politically the value of giving these other projects a home in
> Apache, and making them sub-projects is the simplest route to that.  I'd
> love to have storm on yarn inside Apache.  I just don't want to go
> overboard with it.  There was a time when HBase was a "contrib" module
> under Hadoop along with a lot of other things, and the Apache board came
> and told Hadoop to brake it up.
> Bringing storm-kafka into storm does not sound like it will solve much
> from a developer's perspective, because there is at least as much coupling
> with kafka as there is with storm.  I can see how it is a huge amount of
> overhead and pain to set up a new project just for a few hundred lines of
> code, as such I am in favor of pulling in closely related projects,
> especially those that are spouts and state implementations. I just want to
> be sure that we do it carefully, with a good reason, and with enough people
> who are familiar with the code to support it long term.
> If it starts to look like we are pulling in too many projects perhaps we
> should look at something more like the bigtop project
> which produces a tested distribution of Hadoop
> with many different sub-projects included in it.
> I am also a bit concerned about these sub-projects becoming second class
> citizens, where we break something, but because the build is off by default
> we don't know it.  I would prefer that they are built and tested by
> default.  If the build and test time starts to take too long, to me that
> means we need to start wondering if we have too many contrib modules.
> --Bobby
> From: Brian Enochson <<mailto:
> Reply-To: "<mailto:
>>" <<mailto:
> Date: Tuesday, February 25, 2014 at 9:50 PM
> To: "<mailto:
>>" <<mailto:
> Cc: "<>"
> <<>>
> Subject: Re: [DISCUSS] Pulling "Contrib" Modules into Apache
> hi,
>    I am in agreement with Taylor and believe I understand his intent. An
> incredible tool/framework/application like Storm is only enhanced and gains
> value from the number of well maintained and vetted modules that can be
> used for integration and adding further functionality.
>   I am relatively new to the Storm community but have spent quite some
> time reviewing contributing modules out there, reviewing various duplicates
> and running into some version incompatibilities. I understand the need to
> keep Storm itself pure, but do think there needs to be some structure and
> governance added to the contributing modules. Look at the benefit a tool
> like npm brings to the node community.
>   I like the idea of sponsorship, vetting and a community vote.  I, as
> sure many would be, am willing to offer support and time to working through
> how to set this up and helping with the implementation if it is decided to
> pursue some solution.
>   I hope these views are taken in the sprit they are made, to make this
> incredible system even better along with the surrounding eco-system.
> Thanks,
> Brian
> On Tue, Feb 25, 2014 at 9:36 PM, P. Taylor Goetz <
> <>> wrote:
> Just to be clear (and play a little Devil's advocate :) ), I'm not
> suggesting that whatever a "contrib" project/module/subproject might
>  become, be a clearinghouse for anything Storm-related.
> I see it as something that is well-vetted by the Storm community, subject
> to PPMC review, vote, etc. Entry would require community review, PPMC
> review, and in some cases ASF IP clearance/legal review. Anything added
> would require some level of commitment from the PPMC/committers to provide
> some level of support.
> In other words, nothing "willy-nilly".
> One option could be that any module added require (X > 0)  number of
> committers to volunteer as "sponsor"s for the module, and commit to
> maintaining it.
> That being said, I don't see storm-kafka being any different from anything
> else that provides integration points for Storm.
> -Taylor
> On Feb 25, 2014, at 7:53 PM, Nathan Marz <<mailto:
>>> wrote:
> I'm only +1 for pulling in storm-kafka and updating it. Other projects put
> these contrib modules in a "contrib" folder and keep them managed as
> completely separate codebases. As it's not actually a "module" necessary
> for Storm, there's an argument there for doing it that way rather than via
> the multi-module route.
> On Tue, Feb 25, 2014 at 4:39 PM, Milinda Pathirage <
> <>> wrote:
> Hi Taylor,
> I'm +1 for pulling these external libraries into Apache codebase. This
> will certainly benifit Strom community. I also like to contribute to
> this process.
> Thanks
> Milinda
> On Tue, Feb 25, 2014 at 5:28 PM, P. Taylor Goetz <
> <>> wrote:
> > A while back I opened STORM-206 [1] to capture ideas for pulling in
> > "contrib" modules to the Apache codebase.
> >
> > In the past, we had the storm-contrib github project [2] which
> subsequently
> > got broken up into individual projects hosted on the stormprocessor
> github
> > group [3] and elsewhere.
> >
> > The problem with this approach is that in certain cases it led to code
> rot
> > (modules not being updated in step with Storm's API), fragmentation
> > (multiple similar modules with the same name), and confusion.
> >
> > A good example of this is the storm-kafka module [4], since it is a
> widely
> > used component. Because storm-contrib wasn't being tagged in github, a
> lot
> > of users had trouble reconciling with which versions of storm it was
> > compatible. Some users built off specific commit hashes, some forked,
> and a
> > few even pushed custom builds to repositories such as clojars. With kafka
> > 0.8 now available, there are two main storm-kafka projects, the original
> > (compatible with kafka 0.7) and an updated fork [5] (compatible with
> kafka
> > 0.8).
> >
> > My intention is not to find fault in any way, but rather to point out the
> > resulting pain, and work toward a better solution.
> >
> > I think it would be beneficial to the Storm user community to have
> certain
> > commonly used modules like storm-kafka brought into the Apache Storm
> > project. Another benefit worth considering is the licensing/legal
> oversight
> > that the ASF provides, which is important to many users.
> >
> > If this is something we want to do, then the big question becomes what
> sort
> > governance process needs to be established to ensure that such things are
> > properly maintained.
> >
> > Some random thoughts, questions, etc. that jump to mind include:
> >
> > What to call these things: "contib modules", "connectors", "integration
> > modules", etc.?
> > Build integration: I imagine they would be a multi-module submodule of
> the
> > main maven build. Probably turned off by default and enabled by a maven
> > profile.
> > Governance: Have one or more committer volunteers responsible for
> > maintenance, merging patches, etc.? Proposal process for pulling new
> > modules?
> >
> >
> > I look forward to hearing others' opinions.
> >
> > - Taylor
> >
> >
> > [1]
> > [2]
> > [3]
> > [4]
> > [5]
> --
> Milinda Pathirage
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
> twitter: milindalakmal
> skype: milinda.pathirage
> blog:<>
> --
> Twitter: @nathanmarz

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message