From dev-return-17880-archive-asf-public=cust-asf.ponee.io@metron.apache.org Tue Apr 21 17:12:50 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 1803018065D for ; Tue, 21 Apr 2020 19:12:50 +0200 (CEST) Received: (qmail 65578 invoked by uid 500); 21 Apr 2020 17:12:49 -0000 Mailing-List: contact dev-help@metron.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@metron.apache.org Delivered-To: mailing list dev@metron.apache.org Received: (qmail 65564 invoked by uid 99); 21 Apr 2020 17:12:49 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Apr 2020 17:12:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 6132BC012B for ; Tue, 21 Apr 2020 17:12:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.001 X-Spam-Level: * X-Spam-Status: No, score=1.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id crntXY0sh5HP for ; Tue, 21 Apr 2020 17:12:46 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.166.46; helo=mail-io1-f46.google.com; envelope-from=ottobackwards@gmail.com; receiver= Received: from mail-io1-f46.google.com (mail-io1-f46.google.com [209.85.166.46]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id EE1B5BBA01 for ; Tue, 21 Apr 2020 17:12:45 +0000 (UTC) Received: by mail-io1-f46.google.com with SMTP id i19so15775826ioh.12 for ; Tue, 21 Apr 2020 10:12:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:in-reply-to:references:mime-version:date:message-id:subject:to; bh=QAnWcxqELgaqqYxbrDLgf39ErjEcRxQ4WGQemEJyyKg=; b=MhYBhPjRG8MQIBbkIaIF2Zg88KNaW3yH43H9B9iGUjYzarvFget1v0xvScMgfMX5fy WIYJdqL/6j05vE4IsUuQx2rjgnmC83lnK1s0HSbdveK2I2WxzCKSPuNhgr939okiZSkV ExUsIXSR5ApUotTYXZuGyycLPoyoGjSjzpZ26Q6k/Biz1znTQnObvw5G8eNoZmxRoEhm MzMACqa4VDfRRGluYXLG51c88kLFP0L+VHLkHpNfVB4BzhwAl9z++zSK/KedJTSjPiNH 61Qi6InNG/ZD2Wu07fEBnHtk0jFsrlcpB1J3Jhq+uYZK37cXz44XNpKNMEEqoee65pdl O0Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:in-reply-to:references:mime-version:date :message-id:subject:to; bh=QAnWcxqELgaqqYxbrDLgf39ErjEcRxQ4WGQemEJyyKg=; b=EygXNG/BQFpVoiUvNsmK2x5nE8Ad3vTHQBYhnppotU4hcm512xjlwYGaB+pyko8Ovl 5FSG5uDsRTJevVw3Qz/zQBMSgHwVFJohIGFANwSGa4zlKrjmvMpxqMSo5O+nPAffxHAY b8bAxMVmpXPy8140dvn2lYQ74G2LbbQ69NtrEH4t5ZPwTeUHCbciHmgtnzK2Wjnzegai qNEmIMBpS/+ObAh/Te4VAB1xuFdYMrCUKbuqR3+TRpM5HIzdl4NUd+BsBWGaPUoa0TEV q4oWusejyxRPvzfi4K+BjfbpTbO/JDAK6NzdFzO6iicnh2smBMPclJjUXO9Tm08tYvg8 qslQ== X-Gm-Message-State: AGi0PuaPhhGXmTqJaef7KKSxuIQDEgU0GIrtk35Qhvm2DJRcX3Yl566L J4XiT1joOSYTWhZF2WrvDs/H9eIa1EuLpg7zynvSzA== X-Google-Smtp-Source: APiQypJjmbns98SX8i4g7m188+Drh+Fu/JLHczFM5wxPtDnssS3lCoFepECUWMBXj/54z1hDWNCdTLsRqssOJvQnXE8= X-Received: by 2002:a02:8546:: with SMTP id g64mr20924826jai.57.1587489164735; Tue, 21 Apr 2020 10:12:44 -0700 (PDT) Received: from 1058052472880 named unknown by gmailapi.google.com with HTTPREST; Tue, 21 Apr 2020 10:12:43 -0700 From: Otto Fowler In-Reply-To: References: <01c724e7597e4c43940e6616eecbb69b@ubc.ca> <0ac141b1f4fc4d6883c1a4b6c44fe010@ubc.ca> MIME-Version: 1.0 Date: Tue, 21 Apr 2020 10:12:43 -0700 Message-ID: Subject: Re: Development Activity has dropped to effectively 0, what should we do? To: dev@metron.apache.org Content-Type: multipart/alternative; boundary="000000000000bfb98c05a3d01f2b" --000000000000bfb98c05a3d01f2b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I think the difference is the maintenance of the core of metron that *has* to be, and other things that may still be done, but will be worked on for their merits or by community need and not be required for everything On April 21, 2020 at 10:29:24, Justin Leet (justinjleet@gmail.com) wrote: How we install depends on what we're choosing to keep around. My concern is getting core Metron's scope down to a supportable level. This entire conversation is probably just a thought experiment until we properly limit the rest of our scope. It's putting the cart before the horse. I want to emphasize this, because we're having a discussion about how to install something that in many ways doesn't actually exist yet. A lot of the install complexity comes from managing so many moving parts at once (ES/Solr, the UI, Kerberos, etc.). If we cut that down, I'm not sure we need a big installer to manage everything. Plenty of projects trust people to be able to run convenience scripts and shell commands. Again, I think this is an academic discussion until we figure out our overall project direction. On Tue, Apr 21, 2020 at 10:02 AM Nick Allen wrote: > Hi Tom - > > > Do you or anyone have enough experience to judge if it is possible to > leverage Ansible as a replacement to deploy a working cluster? > > Yes, I worked a lot on the Ansible mechanism in the early days of Metron. > This was the primary deployment mechanism before we had the Ambari MPack. > > We found it very difficult to use Ansible to create a one-size-fits-all > deployment solution. It's possible, but very difficult to get a solution > that doesn't take close monitoring and manual work arounds when attempting > to use it across environments of different sizes and shapes. In terms of > usability, the Ambari MPack was a big step-up in my opinion. > > > > perhaps a dedicated docker image that is designed to connect with other > dockerized applications such as Storm, Kafka, etc..? > > Yes, I think that would be the way to go for a dev environment. We would be > able to use community supported containers for most of our underlying > platform needs. Unfortunately, this alone would not help anyone deploy > Metron on a cluster. > > > > > On Tue, Apr 21, 2020 at 9:08 AM Yerex, Tom wrote: > > > Hi Nick, > > > > I see there is a lot of work done using Ansible in the repository. Do you > > or anyone have enough experience to judge if it is possible to leverage > > Ansible as a replacement to deploy a working cluster? > > > > Now that I am typing this out, I wonder if docker might be a solution > that > > would work? I don't have much experience with docker, perhaps a dedicated > > docker image that is designed to connect with other dockerized > applications > > such as Storm, Kafka, etc..? > > > > --Tom. > > > > =EF=BB=BFOn 2020-04-17, 11:27 AM, "Nick Allen" wro= te: > > > > This is a good discussion and one that I haven't fully grappled with > > in my > > own mind yet. I'll have more to add, but I just want to chime in on > the > > topic of Ambari at this point. > > > > ### Ambari and the Paywall > > > > The problem with Ambari is that its installation mechanism requires a > > repository of compiled packages (RPMs, DEBs, etc.) To install the > > underlying platform dependencies (like Kafka, HBase, Storm, Zk, etc) > we > > relied on binary packages that were made freely available by > > Cloudera/Hortonworks. As of this past January, those packages are now > > behind a paywall. > > > > Due to the paywall, installing your own HDP cluster with Ambari is > now > > effectively dead. I am not sure if legacy versions of Kafka, HBase, > > Storm, > > etc will continue to be freely available, but even if so, we cannot > > continue to rely on this mechanism if new versions and security > updates > > will not be made available. > > > > The Apache Metron project does not publish compiled binaries or > > packages > > either. We do make the code freely available to allow users to build > > and > > publish their own Metron packages. But even with this capability, > > unless > > you have a means to install the underlying platform dependencies via > > Ambari, installing Metron with Ambari has little value. > > > > Unfortunately, I don't see a feasible path forward for Metron's > Ambari > > MPack. > > > > ### Dev Environment > > > > This not only impacts the users of Apache Metron, this impacts > > contributors > > also. Our primary development environment relies on that Ambari > > MPack. To > > continue development on any of the components of Apache Metron, we > > would > > need to build an alternative development environment that can > function > > despite the paywall. That could take many shapes, but in my opinion > it > > would be a blocker for continuing any development on Apache Metron, > > unfortunately. > > > > Please do let me know if anyone disagrees or can think of an > > alternative > > approach that would allow the current Ambari MPack to remain viable. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Apr 16, 2020 at 4:34 PM Dima Kovalyov > > wrote: > > > > > - Dropping Ambari. > > > > > > I like the progress that Apache did with Ambari in 2.7. And I don't > > know a > > > better installer/manager for all the services (we use other Hadoop > > eco > > > services besides Metron). > > > > > > Sometimes its buggy, agents get stuck or server needs reboot from > > time to > > > time, mpacks brake some functionality. But overall I feel this is > the > > > direction for central management and orchestration. > > > > > > - Dima > > > > > > On Wed, Apr 15, 2020, 12:45 Justin Leet > > wrote: > > > > > > > This is a bit off the top of my head, but I'd I agree with pretty > > much > > > all > > > > of points on what's bringing a lot of overhead. There's probably > > also a > > > > worthwhile discussion about what value we're shooting for the > > project to > > > > provide to people that influences what stays/goes. > > > > > > > > Thinking out loud a bit > > > > > > > > - Dropping Storm and moving to Spark drops the very hard to > > > > tune/manage/troubleshoot Storm. > > > > - Dropping the UIs (and making SQL the external interface) > > pretty much > > > > implies dropping the REST APIs and ES/Solr. ES/Solr have been > > a giant > > > > source of dev heartache on the project and they exist > primarily > > for > > > the > > > > real time use case. People can build whatever UIs or use > > existing > > > tools > > > > against Parquet/Hive/whatever. > > > > - Dropping Ambari. It's a complex beast to install because of > > how many > > > > components we have. Dropping the above makes our install much > > easier > > > and > > > > should alleviate the need for a complex installer. > > > > > > > > At that point, we're basically left with > > > > > > > > - Some Spark for parse -> enrich -> output > > > > - The profiler > > > > - Stellar > > > > - Probably some other misc stuff (sensors, bro kafka plugging, > > etc.) > > > > > > > > At a glance, that seems almost an order of magnitude smaller than > > what we > > > > currently try to handle. > > > > > > > > I'm not really sure what an appropriate way to handle the > profiler > > is. > > > I've > > > > barely touched the code for it, so I anything I say is a vague > > guess. > > > > > > > > On Wed, Apr 8, 2020 at 7:38 PM Yerex, Tom > > wrote: > > > > > > > > > To me Metron is big and broad in the scope of technology > > required to > > > get > > > > > it running. If things were more modular that would go a long > way > > to > > > > > reducing the learning curve or at least putting it into smaller > > bites > > > > (and > > > > > it might encourage more people to get involved). > > > > > > > > > > If the UI were an add-on module in another project, it would > > have made > > > it > > > > > easier for me and it could also encourage my hypothetical buddy > > who is > > > a > > > > > web developer expert to get involved since he could focus on > the > > web-ui > > > > > module instead of trying to tackle all the other pieces that > are > > > probably > > > > > not part of his bailiwick. > > > > > > > > > > Stellar is very intriguing, maybe that is not unique to Metron? > > The > > > > > architecture of Metron with respect to parsing, enriching, > etc., > > makes > > > a > > > > > lot of sense to anyone I talk with. These two aspects of Metron > > seem > > > like > > > > > standout examples that make for a powerful platform to develop > > on. > > > > > > > > > > Thanks for continuing this discussion, > > > > > > > > > > Tom. > > > > > > > > > > > > > > > On 2020-04-08 15:32:46-07:00 Casey Stella wrote: > > > > > > > > > > As far as I know there is no minimum bar of development > activity > > to > > > keep > > > > a > > > > > project open. I think we would all be grateful for any > > investment that > > > > you > > > > > or your organization would want to make. > > > > > It also occurs to me that your observation is absolutely spot > > on: we > > > have > > > > > a LOT of moving parts. > > > > > I see some deficiencies here: > > > > > > > > > > * We depend on a lot of the various hadoop ecosystem > > projects and > > > > they > > > > > have to work together very precisely: > > > > > * This makes for a system that is hard to install. > > > > > * This also makes for a system which is hard to > > tune/manage > > > > > * We have a large surface area of coverage > > > > > * We have an installer, backend system and front-end UI, > > which > > > > > stretches our developers a bit thin, especially since there > > isn't even > > > > > interest in those systems > > > > > > > > > > Perhaps a reconsideration of the scope and technologies that we > > use > > > would > > > > > be merited? If we were to decide to, for instance: > > > > > > > > > > * Consolidate scope: focus on a viable backend/API rather > > than a UI > > > > > * Consolidate technology: reposition ourselves on top of > > Spark as a > > > > > consolidated streaming/batch system > > > > > * Make SQL our external interface: write out to parquet + > > the Hive > > > > > metastore and let users pin up presto tables or hive tables as > > they see > > > > fit > > > > > > > > > > This might reduce some of our surface area and make it more > > viable to > > > get > > > > > started? > > > > > Anyway, just some thoughts. > > > > > Casey > > > > > > > > > > On Wed, Apr 8, 2020 at 6:20 PM Yerex, Tom > > > > > tom.yerex@ubc.ca>> wrote: > > > > > Hi Casey, > > > > > > > > > > I'm new here and new to contributing to an open source project. > > Thus > > > far > > > > > my contribution has been questions, however the steep learning > > curve > > > has > > > > > had me working to understand all the moving parts for the last > 18 > > > months > > > > > and I see that as a big investment by my organization. > > > > > > > > > > What is a level that would be viable? > > > > > > > > > > If my organization were to contribute I don't know that it > would > > be > > > soon > > > > > enough or at the volume that is recognized as viable, which is > > why I > > > ask > > > > > the question. > > > > > > > > > > > > > > > On 2020-04-08 15:05:51-07:00 Casey Stella wrote: > > > > > > > > > > Hi all, > > > > > > > > > > When composing the board report today, I realized that we have > > > > effectively > > > > > had no development in the last quarter on this project. Please > > be > > > aware > > > > > that I say this without a shred of blame or judgement > > (especially so > > > > > considering I have not contributed in a long time). That being > > said, I > > > > > would like to pose the question to the community: > > > > > > > > > > Do we feel that this project is viable? If so, how are we > going > > to > > > spur > > > > > new contributions? If not, then should we begin the process to > > fold > > > the > > > > > project? > > > > > > > > > > > > > > > Best, > > > > > > > > > > Casey > > > > > > > > > > > > > > > > > > > > --000000000000bfb98c05a3d01f2b--