Return-Path: X-Original-To: apmail-storm-dev-archive@minotaur.apache.org Delivered-To: apmail-storm-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DE8E31052B for ; Wed, 26 Feb 2014 19:21:18 +0000 (UTC) Received: (qmail 28907 invoked by uid 500); 26 Feb 2014 19:21:18 -0000 Delivered-To: apmail-storm-dev-archive@storm.apache.org Received: (qmail 28823 invoked by uid 500); 26 Feb 2014 19:21:17 -0000 Mailing-List: contact dev-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@storm.incubator.apache.org Delivered-To: mailing list dev@storm.incubator.apache.org Received: (qmail 28812 invoked by uid 99); 26 Feb 2014 19:21:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Feb 2014 19:21:17 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of boneill42@gmail.com designates 209.85.216.173 as permitted sender) Received: from [209.85.216.173] (HELO mail-qc0-f173.google.com) (209.85.216.173) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Feb 2014 19:21:13 +0000 Received: by mail-qc0-f173.google.com with SMTP id x3so2040481qcv.18 for ; Wed, 26 Feb 2014 11:20:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:user-agent:date:subject:from:to:message-id:thread-topic :references:in-reply-to:mime-version:content-type :content-transfer-encoding; bh=//2j85frGYNUo/6ULElgLJoEh4P4OF7gip0cssSCypg=; b=V9XtExZjxkyzvM7I8YCx0tLI3lOlagNmRknQi7lLzh+S/4mlZsvDIYJpEEeCvjRt2h m/AX7LSLY5idBLE5Rql5g/d4n9frvfl3HwQb9ccl5bd/OY1OgXke7Bww6td+FxTVw96J YXzrBHziCe0T5hzvJXRXR/Xk5Kh0kzNWVsbYvIIsbx9MbXZsLiBsRHb++75QUBKr91PF GVC9THWAjY3hyzrPqx9jMcLxRIMSjTWPk3M17Y+nfV2QXRpmnY/wGCf8Gg3ZHK2lS4O3 qV2Aa5hVot/d4XgH544UjfJIKzUWDIM5UZsLrfMqoprJkchQqE7g1v4z/w53SnRZmnol BtUw== X-Received: by 10.140.34.99 with SMTP id k90mr1641592qgk.15.1393442452701; Wed, 26 Feb 2014 11:20:52 -0800 (PST) Received: from [10.60.71.81] ([67.132.206.254]) by mx.google.com with ESMTPSA id z18sm5363174qab.5.2014.02.26.11.20.48 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 26 Feb 2014 11:20:51 -0800 (PST) Sender: "Brian O'Neill" User-Agent: Microsoft-MacOutlook/14.3.9.131030 Date: Wed, 26 Feb 2014 14:20:41 -0500 Subject: Re: [DISCUSS] Pulling "Contrib" Modules into Apache From: Brian O'Neill To: Message-ID: Thread-Topic: [DISCUSS] Pulling "Contrib" Modules into Apache References: <20265179-33CB-44AA-9F79-1053AFF2C568@gmail.com> In-Reply-To: Mime-version: 1.0 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I=B9ll pile on. (+1 to Robert=B9s sentiments) Taylor, just give the word and I can start to transition the IP for storm-cassandra and storm-cassandra-cql. I can also lend a hand supporting them. -brian --- Brian O'Neill Chief Technology Officer Health Market Science The Science of Better Results 2700 Horizon Drive =80 King of Prussia, PA =80 19406 M: 215.588.6024 =80 @boneill42 =80 healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. =20 On 2/26/14, 1:43 PM, "Robert Lee" wrote: >To build on Bobby's statement, it does pain me as a user to have to search >outside of the project modules to find a compatible build that works with >the latest version of storm as well as the latest module version. However, >in instances such as hbase, cassandra, kafka, etc., I think these commonly >used contrib projects should be pulled into storm if they meet stringent >criteria of: > >1) Several volunteer developers familiar with code to update as new >versions arise >2) Fully implemented bolt/spout > >" If the build and test time starts to take too long, to me that means we >need to start wondering if we have too many contrib modules." -- +1 > >I would be willing to volunteer with the cassandra backing map module >(especially with the latest CQL3 release). > > >On Wed, Feb 26, 2014 at 12:35 PM, Bobby Evans wrote: > >> I can see a lot of value in having a distribution of storm that comes >>with >> batteries included, everything is tested together and you know it works. >> But I don't see much long term developer benefit in building them all >> together. If there is strong coupling between storm and these external >> projects so that they break when storm changes then we need to >>understand >> the coupling and decide if we want to reduce that coupling by >>stabilizing >> APIs, improving version numbering and release process, etc.; or if the >> functionality is something that should be offered as a base service in >> storm. >> >> I can see politically the value of giving these other projects a home in >> Apache, and making them sub-projects is the simplest route to that. I'd >> love to have storm on yarn inside Apache. I just don't want to go >> overboard with it. There was a time when HBase was a "contrib" module >> under Hadoop along with a lot of other things, and the Apache board came >> and told Hadoop to brake it up. >> >> Bringing storm-kafka into storm does not sound like it will solve much >> from a developer's perspective, because there is at least as much >>coupling >> with kafka as there is with storm. I can see how it is a huge amount of >> overhead and pain to set up a new project just for a few hundred lines >>of >> code, as such I am in favor of pulling in closely related projects, >> especially those that are spouts and state implementations. I just want >>to >> be sure that we do it carefully, with a good reason, and with enough >>people >> who are familiar with the code to support it long term. >> >> If it starts to look like we are pulling in too many projects perhaps we >> should look at something more like the bigtop project >> https://bigtop.apache.org/ which produces a tested distribution of >>Hadoop >> with many different sub-projects included in it. >> >> I am also a bit concerned about these sub-projects becoming second class >> citizens, where we break something, but because the build is off by >>default >> we don't know it. I would prefer that they are built and tested by >> default. If the build and test time starts to take too long, to me that >> means we need to start wondering if we have too many contrib modules. >> >> --Bobby >> >> From: Brian Enochson > brian.enochson@gmail.com>> >> Reply-To: "user@storm.incubator.apache.org> user@storm.incubator.apache.org>" >>> user@storm.incubator.apache.org>> >> Date: Tuesday, February 25, 2014 at 9:50 PM >> To: "user@storm.incubator.apache.org> user@storm.incubator.apache.org>" >>> user@storm.incubator.apache.org>> >> Cc:=20 >>"dev@storm.incubator.apache.org" >> > >> Subject: Re: [DISCUSS] Pulling "Contrib" Modules into Apache >> >> hi, >> I am in agreement with Taylor and believe I understand his intent. An >> incredible tool/framework/application like Storm is only enhanced and >>gains >> value from the number of well maintained and vetted modules that can be >> used for integration and adding further functionality. >> I am relatively new to the Storm community but have spent quite some >> time reviewing contributing modules out there, reviewing various >>duplicates >> and running into some version incompatibilities. I understand the need >>to >> keep Storm itself pure, but do think there needs to be some structure >>and >> governance added to the contributing modules. Look at the benefit a tool >> like npm brings to the node community. >> I like the idea of sponsorship, vetting and a community vote. I, as >> sure many would be, am willing to offer support and time to working >>through >> how to set this up and helping with the implementation if it is decided >>to >> pursue some solution. >> I hope these views are taken in the sprit they are made, to make this >> incredible system even better along with the surrounding eco-system. >> >> Thanks, >> Brian >> >> >> On Tue, Feb 25, 2014 at 9:36 PM, P. Taylor Goetz > > wrote: >> Just to be clear (and play a little Devil's advocate :) ), I'm not >> suggesting that whatever a "contrib" project/module/subproject might >> become, be a clearinghouse for anything Storm-related. >> >> I see it as something that is well-vetted by the Storm community, >>subject >> to PPMC review, vote, etc. Entry would require community review, PPMC >> review, and in some cases ASF IP clearance/legal review. Anything added >> would require some level of commitment from the PPMC/committers to >>provide >> some level of support. >> >> In other words, nothing "willy-nilly". >> >> One option could be that any module added require (X > 0) number of >> committers to volunteer as "sponsor"s for the module, and commit to >> maintaining it. >> >> That being said, I don't see storm-kafka being any different from >>anything >> else that provides integration points for Storm. >> >> -Taylor >> >> >> On Feb 25, 2014, at 7:53 PM, Nathan Marz > nathan@nathanmarz.com>> wrote: >> >> I'm only +1 for pulling in storm-kafka and updating it. Other projects >>put >> these contrib modules in a "contrib" folder and keep them managed as >> completely separate codebases. As it's not actually a "module" necessary >> for Storm, there's an argument there for doing it that way rather than >>via >> the multi-module route. >> >> >> On Tue, Feb 25, 2014 at 4:39 PM, Milinda Pathirage >>> > wrote: >> Hi Taylor, >> >> I'm +1 for pulling these external libraries into Apache codebase. This >> will certainly benifit Strom community. I also like to contribute to >> this process. >> >> Thanks >> Milinda >> >> On Tue, Feb 25, 2014 at 5:28 PM, P. Taylor Goetz > > wrote: >> > A while back I opened STORM-206 [1] to capture ideas for pulling in >> > "contrib" modules to the Apache codebase. >> > >> > In the past, we had the storm-contrib github project [2] which >> subsequently >> > got broken up into individual projects hosted on the stormprocessor >> github >> > group [3] and elsewhere. >> > >> > The problem with this approach is that in certain cases it led to code >> rot >> > (modules not being updated in step with Storm's API), fragmentation >> > (multiple similar modules with the same name), and confusion. >> > >> > A good example of this is the storm-kafka module [4], since it is a >> widely >> > used component. Because storm-contrib wasn't being tagged in github, a >> lot >> > of users had trouble reconciling with which versions of storm it was >> > compatible. Some users built off specific commit hashes, some forked, >> and a >> > few even pushed custom builds to repositories such as clojars. With >>kafka >> > 0.8 now available, there are two main storm-kafka projects, the >>original >> > (compatible with kafka 0.7) and an updated fork [5] (compatible with >> kafka >> > 0.8). >> > >> > My intention is not to find fault in any way, but rather to point out >>the >> > resulting pain, and work toward a better solution. >> > >> > I think it would be beneficial to the Storm user community to have >> certain >> > commonly used modules like storm-kafka brought into the Apache Storm >> > project. Another benefit worth considering is the licensing/legal >> oversight >> > that the ASF provides, which is important to many users. >> > >> > If this is something we want to do, then the big question becomes what >> sort >> > governance process needs to be established to ensure that such things >>are >> > properly maintained. >> > >> > Some random thoughts, questions, etc. that jump to mind include: >> > >> > What to call these things: "contib modules", "connectors", >>"integration >> > modules", etc.? >> > Build integration: I imagine they would be a multi-module submodule of >> the >> > main maven build. Probably turned off by default and enabled by a >>maven >> > profile. >> > Governance: Have one or more committer volunteers responsible for >> > maintenance, merging patches, etc.? Proposal process for pulling new >> > modules? >> > >> > >> > I look forward to hearing others' opinions. >> > >> > - Taylor >> > >> > >> > [1] https://issues.apache.org/jira/browse/STORM-206 >> > [2] https://github.com/nathanmarz/storm-contrib >> > [3] https://github.com/stormprocessor >> > [4]=20 >>https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka >> > [5] https://github.com/wurstmeister/storm-kafka-0.8-plus >> >> >> >> -- >> Milinda Pathirage >> >> PhD Student | Research Assistant >> School of Informatics and Computing | Data to Insight Center >> Indiana University >> >> twitter: milindalakmal >> skype: milinda.pathirage >> blog: http://milinda.pathirage.org >> >> >> >> -- >> Twitter: @nathanmarz >> http://nathanmarz.com >> >> >>