metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Allen <n...@nickallen.org>
Subject Re: Development Activity has dropped to effectively 0, what should we do?
Date Fri, 17 Apr 2020 18:26:50 GMT
This is a good discussion and one that I haven't fully grappled with in my
own mind yet. I'll have more to add, but I just want to chime in on the
topic of Ambari at this point.

### Ambari and the Paywall

The problem with Ambari is that its installation mechanism requires a
repository of compiled packages (RPMs, DEBs, etc.) To install the
underlying platform dependencies (like Kafka, HBase, Storm, Zk, etc) we
relied on binary packages that were made freely available by
Cloudera/Hortonworks. As of this past January, those packages are now
behind a paywall.

Due to the paywall, installing your own HDP cluster with Ambari is now
effectively dead.  I am not sure if legacy versions of Kafka, HBase, Storm,
etc will continue to be freely available, but even if so, we cannot
continue to rely on this mechanism if new versions and security updates
will not be made available.

The Apache Metron project does not publish compiled binaries or packages
either.  We do make the code freely available to allow users to build and
publish their own Metron packages.   But even with this capability, unless
you have a means to install the underlying platform dependencies via
Ambari, installing Metron with Ambari has little value.

Unfortunately, I don't see a feasible path forward for Metron's Ambari
MPack.

### Dev Environment

This not only impacts the users of Apache Metron, this impacts contributors
also. Our primary development environment relies on that Ambari MPack.  To
continue development on any of the components of Apache Metron, we would
need to build an alternative development environment that can function
despite the paywall.  That could take many shapes, but in my opinion it
would be a blocker for continuing any development on Apache Metron,
unfortunately.

Please do let me know if anyone disagrees or can think of an alternative
approach that would allow the current Ambari MPack to remain viable.
















On Thu, Apr 16, 2020 at 4:34 PM Dima Kovalyov <dimdroll@gmail.com> wrote:

>   - Dropping Ambari.
>
> I like the progress that Apache did with Ambari in 2.7. And I don't know a
> better installer/manager for all the services (we use other Hadoop eco
> services besides Metron).
>
> Sometimes its buggy, agents get stuck or server needs reboot from time to
> time, mpacks brake some functionality. But overall I feel this is the
> direction for central management and orchestration.
>
> - Dima
>
> On Wed, Apr 15, 2020, 12:45 Justin Leet <justinjleet@gmail.com> wrote:
>
> > This is a bit off the top of my head, but I'd I agree with pretty much
> all
> > of points on what's bringing a lot of overhead.  There's probably also a
> > worthwhile discussion about what value we're shooting for the project to
> > provide to people that influences what stays/goes.
> >
> > Thinking out loud a bit
> >
> >    - Dropping Storm and moving to Spark drops the very hard to
> >    tune/manage/troubleshoot Storm.
> >    - Dropping the UIs (and making SQL the external interface) pretty much
> >    implies dropping the REST APIs and ES/Solr.  ES/Solr have been a giant
> >    source of dev heartache on the project and they exist primarily for
> the
> >    real time use case.  People can build whatever UIs or use existing
> tools
> >    against Parquet/Hive/whatever.
> >    - Dropping Ambari. It's a complex beast to install because of how many
> >    components we have. Dropping the above makes our install much easier
> and
> >    should alleviate the need for a complex installer.
> >
> > At that point, we're basically left with
> >
> >    - Some Spark for parse -> enrich -> output
> >    - The profiler
> >    - Stellar
> >    - Probably some other misc stuff (sensors, bro kafka plugging, etc.)
> >
> > At a glance, that seems almost an order of magnitude smaller than what we
> > currently try to handle.
> >
> > I'm not really sure what an appropriate way to handle the profiler is.
> I've
> > barely touched the code for it, so I anything I say is a vague guess.
> >
> > On Wed, Apr 8, 2020 at 7:38 PM Yerex, Tom <tom.yerex@ubc.ca> wrote:
> >
> > > To me Metron is big and broad in the scope of technology required to
> get
> > > it running. If things were more modular that would go a long way to
> > > reducing the learning curve or at least putting it into smaller bites
> > (and
> > > it might encourage more people to get involved).
> > >
> > > If the UI were an add-on module in another project, it would have made
> it
> > > easier for me and it could also encourage my hypothetical buddy who is
> a
> > > web developer expert to get involved since he could focus on the web-ui
> > > module instead of trying to tackle all the other pieces that are
> probably
> > > not part of his bailiwick.
> > >
> > > Stellar is very intriguing, maybe that is not unique to Metron? The
> > > architecture of Metron with respect to parsing, enriching, etc., makes
> a
> > > lot of sense to anyone I talk with. These two aspects of Metron seem
> like
> > > standout examples that make for a powerful platform to develop on.
> > >
> > > Thanks for continuing this discussion,
> > >
> > > Tom.
> > >
> > >
> > > On 2020-04-08 15:32:46-07:00 Casey Stella wrote:
> > >
> > > As far as I know there is no minimum bar of development activity to
> keep
> > a
> > > project open.  I think we would all be grateful for any investment that
> > you
> > > or your organization would want to make.
> > > It also occurs to me that your observation is absolutely spot on: we
> have
> > > a LOT of moving parts.
> > > I see some deficiencies here:
> > >
> > >   *   We depend on a lot of the various hadoop ecosystem projects and
> > they
> > > have to work together very precisely:
> > >      *   This makes for a system that is hard to install.
> > >      *   This also makes for a system which is hard to tune/manage
> > >   *   We have a large surface area of coverage
> > >      *   We have an installer, backend system and front-end UI, which
> > > stretches our developers a bit thin, especially since there isn't even
> > > interest in those systems
> > >
> > > Perhaps a reconsideration of the scope and technologies that we use
> would
> > > be merited?  If we were to decide to, for instance:
> > >
> > >   *   Consolidate scope: focus on a viable backend/API rather than a UI
> > >   *   Consolidate technology: reposition ourselves on top of Spark as a
> > > consolidated streaming/batch system
> > >   *   Make SQL our external interface: write out to parquet + the Hive
> > > metastore and let users pin up presto tables or hive tables as they see
> > fit
> > >
> > > This might reduce some of our surface area and make it more viable to
> get
> > > started?
> > > Anyway, just some thoughts.
> > > Casey
> > >
> > > On Wed, Apr 8, 2020 at 6:20 PM Yerex, Tom <tom.yerex@ubc.ca<mailto:
> > > tom.yerex@ubc.ca>> wrote:
> > > Hi Casey,
> > >
> > > I'm new here and new to contributing to an open source project. Thus
> far
> > > my contribution has been questions, however the steep learning curve
> has
> > > had me working to understand all the moving parts for the last 18
> months
> > > and I see that as a big investment by my organization.
> > >
> > > What is a level that would be viable?
> > >
> > > If my organization were to contribute I don't know that it would be
> soon
> > > enough or at the volume that is recognized as viable, which is why I
> ask
> > > the question.
> > >
> > >
> > > On 2020-04-08 15:05:51-07:00 Casey Stella wrote:
> > >
> > > Hi all,
> > >
> > > When composing the board report today, I realized that we have
> > effectively
> > > had no development in the last quarter on this project.  Please be
> aware
> > > that I say this without a shred of blame or judgement (especially so
> > > considering I have not contributed in a long time).  That being said, I
> > > would like to pose the question to the community:
> > >
> > > Do we feel that this project is viable?  If so, how are we going to
> spur
> > > new contributions?  If not, then should we begin the process to fold
> the
> > > project?
> > >
> > >
> > > Best,
> > >
> > > Casey
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message