storm-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Huang, Roger" <rohu...@visa.com>
Subject RE: [DISCUSS] Pulling "Contrib" Modules into Apache
Date Wed, 26 Feb 2014 22:19:26 GMT
Bobby,
I vote to include both storm-yarn and storm-deploy.
Roger


-----Original Message-----
From: Brian O'Neill [mailto:boneill42@gmail.com] On Behalf Of Brian O'Neill
Sent: Wednesday, February 26, 2014 3:39 PM
To: dev@storm.incubator.apache.org
Cc: user@storm.incubator.apache.org
Subject: Re: [DISCUSS] Pulling "Contrib" Modules into Apache


Bobby,

FWIW, I¹d love to see storm-yarn inside.  I think we could definitely make things easier
on the end-user if they were more cohesive.

e.g. Imagine if we had ³storm launch yarn² inside of $storm/bin that would kickoff a storm-yarn
launch, with whatever version was built.  It would likely simplify the ³create-tarball²
and storm-yarn getStormConfig process as well.

-brian

---
Brian O'Neill
Chief Technology Officer

Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  € healthmarketscience.com

This information transmitted in this email message is for the intended recipient only and
may contain confidential and/or privileged material. If you received this email in error and
are not the intended recipient, or the person responsible to deliver it to the intended recipient,
please contact the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination, copying or other use
of, or taking any action in reliance upon, this information by persons or entities other than
the intended recipient is strictly prohibited.
 






On 2/26/14, 4:25 PM, "Bobby Evans" <evans@yahoo-inc.com> wrote:

>I totally agree and I am +1 on bringing these spout/trident pieces in, 
>assuming there are committers to support them.
>
>I am also curious about how people feel about pulling in other projects 
>like storm-starter, storm-deploy, storm-mesos, and storm-yarn?
>
>Storm-starter in my option seems more like documentation and it would 
>be nice to pull in so that it stays up to date with storm itself, just 
>like the documentation.
>
>The others are more of ways to run storm in different environments.  
>They seem like there could be a lot of coupling between them and storm 
>as storm evolves, and they kind of fit with "integrate storm with 
>*Technology X*² except X in this case is a compute environment instead 
>of a data source or store. But then again we also just shot down a 
>request to create juju charms for storm.
>
>‹Bobby
>
>From: "P. Taylor Goetz" <ptgoetz@gmail.com<mailto:ptgoetz@gmail.com>>
>Reply-To: 
><dev@storm.incubator.apache.org<mailto:dev@storm.incubator.apache.org>>
>Date: Wednesday, February 26, 2014 at 1:21 PM
>To: 
><dev@storm.incubator.apache.org<mailto:dev@storm.incubator.apache.org>>
>Cc: 
>"user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org>"
><user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org
>>>
>Subject: Re: [DISCUSS] Pulling "Contrib" Modules into Apache
>
>Thanks for the feedback Bobby.
>
>To clarify, I¹m mainly talking about spout/bolt/trident state 
>implementations that integrate storm with *Technology X*, where 
>*Technology X* is not a fundamental part of storm.
>
>Examples would be technologies that are part of or related to the 
>Hadoop/Big Data ecosystem and enable the Lamda Architecture, e.g.: 
>Kafka, HDFS, HBase, Cassandra, etc.
>
>The idea behind having one or more Storm committers act as a ³sponsor² 
>is to make sure new additions are done carefully and with good reason. 
>To add a new module, it would require committer/PPMC consensus, and 
>assignment of one or more sponsors. Part of a sponsor¹s job would be to 
>ensure that a module is maintained, which would require enough 
>familiarity with the code so support it long term. If a new module was 
>proposed, but no committers were willing to act as a sponsor, it would 
>not be added.
>
>It would be the Committers¹/PPMC¹s responsibly to make sure things 
>didn¹t get out of hand, and to do something about it if it does.
>
>Here¹s an old Hadoop JIRA thread [1] discussing the addition of Hive as 
>a contrib module, similar to what happened with HBase as Bobby pointed out.
>Some interesting points are brought up. The difference here is that 
>both HBase and Hive were pretty big codebases relative to Hadoop. With 
>spout/bolt/state implementations I doubt we¹d see anything along that 
>scale.
>
>- Taylor
>
>[1] https://issues.apache.org/jira/browse/HADOOP-3601
>
>
>On Feb 26, 2014, at 12:35 PM, Bobby Evans 
><evans@yahoo-inc.com<mailto:evans@yahoo-inc.com>> wrote:
>
>I can see a lot of value in having a distribution of storm that comes 
>with batteries included, everything is tested together and you know it 
>works.  But I don¹t see much long term developer benefit in building 
>them all together.  If there is strong coupling between storm and these 
>external projects so that they break when storm changes then we need to 
>understand the coupling and decide if we want to reduce that coupling 
>by stabilizing APIs, improving version numbering and release process, 
>etc.; or if the functionality is something that should be offered as a 
>base service in storm.
>
>I can see politically the value of giving these other projects a home 
>in Apache, and making them sub-projects is the simplest route to that.  
>I¹d love to have storm on yarn inside Apache.  I just don¹t want to go 
>overboard with it.  There was a time when HBase was a ³contrib² module 
>under Hadoop along with a lot of other things, and the Apache board 
>came and told Hadoop to brake it up.
>
>Bringing storm-kafka into storm does not sound like it will solve much 
>from a developer¹s perspective, because there is at least as much 
>coupling with kafka as there is with storm.  I can see how it is a huge 
>amount of overhead and pain to set up a new project just for a few 
>hundred lines of code, as such I am in favor of pulling in closely 
>related projects, especially those that are spouts and state 
>implementations. I just want to be sure that we do it carefully, with a 
>good reason, and with enough people who are familiar with the code to 
>support it long term.
>
>If it starts to look like we are pulling in too many projects perhaps 
>we should look at something more like the bigtop project 
>https://bigtop.apache.org/ which produces a tested distribution of 
>Hadoop with many different sub-projects included in it.
>
>I am also a bit concerned about these sub-projects becoming second 
>class citizens, where we break something, but because the build is off 
>by default we don¹t know it.  I would prefer that they are built and 
>tested by default.  If the build and test time starts to take too long, 
>to me that means we need to start wondering if we have too many contrib modules.
>
>‹Bobby
>
>From: Brian Enochson
><brian.enochson@gmail.com<mailto:brian.enochson@gmail.com><mailto:brian
>.en
>ochson@gmail.com>>
>Reply-To: 
>"user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org
>><m ailto:user@storm.incubator.apache.org>"
><user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org
>><m ailto:user@storm.incubator.apache.org>>
>Date: Tuesday, February 25, 2014 at 9:50 PM
>To: 
>"user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org
>><m ailto:user@storm.incubator.apache.org>"
><user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org
>><m ailto:user@storm.incubator.apache.org>>
>Cc: 
>"dev@storm.incubator.apache.org<mailto:dev@storm.incubator.apache.org><
>mai
>lto:dev@storm.incubator.apache.org>"
><dev@storm.incubator.apache.org<mailto:dev@storm.incubator.apache.org><
>mai
>lto:dev@storm.incubator.apache.org>>
>Subject: Re: [DISCUSS] Pulling "Contrib" Modules into Apache
>
>hi,
>  I am in agreement with Taylor and believe I understand his intent. An 
>incredible tool/framework/application like Storm is only enhanced and 
>gains value from the number of well maintained and vetted modules that 
>can be used for integration and adding further functionality.
> I am relatively new to the Storm community but have spent quite some 
>time reviewing contributing modules out there, reviewing various 
>duplicates and running into some version incompatibilities. I 
>understand the need to keep Storm itself pure, but do think there needs 
>to be some structure and governance added to the contributing modules. 
>Look at the benefit a tool like npm brings to the node community.
> I like the idea of sponsorship, vetting and a community vote.  I, as 
>sure many would be, am willing to offer support and time to working 
>through how to set this up and helping with the implementation if it is 
>decided to pursue some solution.
> I hope these views are taken in the sprit they are made, to make this 
>incredible system even better along with the surrounding eco-system.
>
>Thanks,
>Brian
>
>
>On Tue, Feb 25, 2014 at 9:36 PM, P. Taylor Goetz 
><ptgoetz@gmail.com<mailto:ptgoetz@gmail.com><mailto:ptgoetz@gmail.com>>
>wrote:
>Just to be clear (and play a little Devil¹s advocate :) ), I¹m not 
>suggesting that whatever a ³contrib² project/module/subproject might 
>become, be a clearinghouse for anything Storm-related.
>
>I see it as something that is well-vetted by the Storm community, 
>subject to PPMC review, vote, etc. Entry would require community 
>review, PPMC review, and in some cases ASF IP clearance/legal review. 
>Anything added would require some level of commitment from the 
>PPMC/committers to provide some level of support.
>
>In other words, nothing ³willy-nilly².
>
>One option could be that any module added require (X > 0)  number of 
>committers to volunteer as ³sponsor²s for the module, and commit to 
>maintaining it.
>
>That being said, I don¹t see storm-kafka being any different from 
>anything else that provides integration points for Storm.
>
>-Taylor
>
>
>On Feb 25, 2014, at 7:53 PM, Nathan Marz 
><nathan@nathanmarz.com<mailto:nathan@nathanmarz.com><mailto:nathan@nath
>anm
>arz.com>> wrote:
>
>I'm only +1 for pulling in storm-kafka and updating it. Other projects 
>put these contrib modules in a "contrib" folder and keep them managed 
>as completely separate codebases. As it's not actually a "module" 
>necessary for Storm, there's an argument there for doing it that way 
>rather than via the multi-module route.
>
>
>On Tue, Feb 25, 2014 at 4:39 PM, Milinda Pathirage 
><mpathira@umail.iu.edu<mailto:mpathira@umail.iu.edu><mailto:mpathira@um
>ail
>.iu.edu>> wrote:
>Hi Taylor,
>
>I'm +1 for pulling these external libraries into Apache codebase. This 
>will certainly benifit Strom community. I also like to contribute to 
>this process.
>
>Thanks
>Milinda
>
>On Tue, Feb 25, 2014 at 5:28 PM, P. Taylor Goetz 
><ptgoetz@gmail.com<mailto:ptgoetz@gmail.com><mailto:ptgoetz@gmail.com>>
>wrote:
>A while back I opened STORM-206 [1] to capture ideas for pulling in 
>"contrib" modules to the Apache codebase.
>
>In the past, we had the storm-contrib github project [2] which 
>subsequently got broken up into individual projects hosted on the 
>stormprocessor github group [3] and elsewhere.
>
>The problem with this approach is that in certain cases it led to code 
>rot (modules not being updated in step with Storm's API), fragmentation 
>(multiple similar modules with the same name), and confusion.
>
>A good example of this is the storm-kafka module [4], since it is a 
>widely used component. Because storm-contrib wasn't being tagged in 
>github, a lot of users had trouble reconciling with which versions of 
>storm it was compatible. Some users built off specific commit hashes, 
>some forked, and a few even pushed custom builds to repositories such 
>as clojars. With kafka
>0.8 now available, there are two main storm-kafka projects, the 
>original (compatible with kafka 0.7) and an updated fork [5] 
>(compatible with kafka 0.8).
>
>My intention is not to find fault in any way, but rather to point out 
>the resulting pain, and work toward a better solution.
>
>I think it would be beneficial to the Storm user community to have 
>certain commonly used modules like storm-kafka brought into the Apache 
>Storm project. Another benefit worth considering is the licensing/legal 
>oversight that the ASF provides, which is important to many users.
>
>If this is something we want to do, then the big question becomes what 
>sort governance process needs to be established to ensure that such 
>things are properly maintained.
>
>Some random thoughts, questions, etc. that jump to mind include:
>
>What to call these things: "contib modules", "connectors", "integration 
>modules", etc.?
>Build integration: I imagine they would be a multi-module submodule of 
>the main maven build. Probably turned off by default and enabled by a 
>maven profile.
>Governance: Have one or more committer volunteers responsible for 
>maintenance, merging patches, etc.? Proposal process for pulling new 
>modules?
>
>
>I look forward to hearing others' opinions.
>
>- Taylor
>
>
>[1] https://issues.apache.org/jira/browse/STORM-206
>[2] https://github.com/nathanmarz/storm-contrib
>[3] https://github.com/stormprocessor
>[4] https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka
>[5] https://github.com/wurstmeister/storm-kafka-0.8-plus
>
>
>
>--
>Milinda Pathirage
>
>PhD Student | Research Assistant
>School of Informatics and Computing | Data to Insight Center Indiana 
>University
>
>twitter: milindalakmal
>skype: milinda.pathirage
>blog: http://milinda.pathirage.org<http://milinda.pathirage.org/>
>
>
>
>--
>Twitter: @nathanmarz
>http://nathanmarz.com<http://nathanmarz.com/>
>


Mime
View raw message