nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Thomsen <mikerthom...@gmail.com>
Subject Re: Benchmark and Timer driven vs. Event driven
Date Mon, 24 Feb 2020 02:49:54 GMT
A wide variety of institutions use NiFi for common enterprise data
processing, ETL and more. It is also very good at being plugged into
oddball locations in enterprise systems for tasks that people might never
even think of; one of the best examples I've seen was NiFi being used to
act as a fuzzer for downstream systems that were being considered by a
client as potential purchases.

For consultants, I don't think that's a real issue. We've got a mostly
junior data engineer workforce and they rarely need any sort of
intervention by more experienced data engineers. If you anticipate that
you'll encounter stiff resistance if you don't have an answer for where to
hire expertise on day one, the best option I am aware of would be
Cloudera's professional service team (I am not a Cloudera employee). They
could also provide you with commercial case studies if you anticipate that
need.

Beyond that, I think we'd need to take this as a sidebar conversation
because I think there are at least certain rules of decorum on ASF mailing
lists that can be violated if we do vendor-related discussions on ASF lists.

Hope that helps.

On Sun, Feb 23, 2020 at 7:26 PM Martin Ebert <martin.irgang@gmx.de> wrote:

> Can you send me at least 3 links to verify your statement? This would be
> really helpful.
>
> I see the potential of NiFi and would like to push it in management as
> well. Therefore it is essential to have as many good reasons as possible
> (besides my own experience).
>
> Who uses NiFi in concrete terms?
> How high is the satisfaction?
> Where can I find suitable consultants? And how many are freely available on
> the market?
> What are the success stories?
> ...
>
> I often hear the accusation that NiFi is just another open source tool. I
> cannot share this opinion.
>
>
>
> Mike Thomsen <mikerthomsen@gmail.com> schrieb am So., 23. Feb. 2020,
> 18:05:
>
> > Not with hard numbers, but when you look at job reqs and proposals it's
> > ***everywhere***. I also can't remember the last time I saw a data
> > engineering demo or discussion where NiFi or StreamSets wasn't the
> > foundation.
> >
> > On Sun, Feb 23, 2020 at 4:21 PM Martin Ebert <martin.irgang@gmx.de>
> wrote:
> >
> > > "NiFi is now emerging as the de facto standard for data engineering in
> > > the government market in the US in part because properly hardening it
> is
> > > closer to something a well-motivated intern can do than requiring a
> > > "seasoned professional.""
> > > Is there any way to prove this? Sounds interesting.
> > >
> > >
> > > Mike Thomsen <mikerthomsen@gmail.com> schrieb am So., 23. Feb. 2020,
> > > 17:08:
> > >
> > > > > I just made a few benchmarks with NiFi to compare it to another
> > > solution.
> > > >
> > > > Raw performance is only one consideration when choosing an ETL or
> data
> > > > orchestration tool. NiFi has some very critical competitive
> advantages
> > > such
> > > > as how aggressively it protects the contents of the data flow from
> > > external
> > > > failure (ex someone killing the JVM doesn't corrupt hours of work)
> and
> > > how
> > > > easy it is to very deeply harden** it on the security side of things.
> > > Plus,
> > > > you have the fact that unlike many tools in this space, it's very
> agile
> > > in
> > > > being able to stop a job at any time and inspect the inputs and
> > outputs.
> > > >
> > > > ** NiFi is now emerging as the de facto standard for data engineering
> > in
> > > > the government market in the US in part because properly hardening it
> > is
> > > > closer to something a well-motivated intern can do than requiring a
> > > > "seasoned professional."
> > > >
> > > > On Sun, Feb 23, 2020 at 3:36 PM Marc Pellmann <pellmann@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > >
> > > > > I am interested in some insight to timer driven vs. event driven
> and
> > > the
> > > > > future plans with event driven.
> > > > >
> > > > >
> > > > > I just made a few benchmarks with NiFi to compare it to another
> > > solution.
> > > > >
> > > > >
> > > > > The flows primarily consist of synchronous Web Service/REST like
> > calls.
> > > > So
> > > > > I use HandleHttpRequest/HandleHttpResponse. In the concrete
> example I
> > > > just
> > > > > have two processors in between - a ReplaceText and a TransformXml.
> > > > >
> > > > >
> > > > > From the client side I use JMeter to generate the load (just POST
> > calls
> > > > > with a few bytes content).
> > > > >
> > > > >
> > > > > First I tested this with standard values, which means timer driven
> > > > > scheduling strategy and 1 task.
> > > > >
> > > > >
> > > > > The numbers from this tests where not very impressive, so I played
> > with
> > > > the
> > > > > configuration and setted the scheduling strategy to event driven
> > (with
> > > > task
> > > > > value 0 and maximum event driven thread count of 1). This could be
> > only
> > > > > done for the two processors between and not for the
> > > > > HandleHttpRequest/HandleHttpResponse since they do not allow such
> > > > > configuration.
> > > > >
> > > > >
> > > > > This increased the throughput by the factor 6.
> > > > >
> > > > >
> > > > > I also tested to increase the throughput with some other
> > > configurations,
> > > > > such as more tasks or different run durations, but this did not
> > changed
> > > > the
> > > > > values significantly.
> > > > >
> > > > >
> > > > > So a least for this type of scenario, the event driven
> configuration
> > is
> > > > > much better. But on the other side it is still experimental and
> > > according
> > > > > to some posts it is not seen as a good option and sounds more like
> it
> > > is
> > > > > something that might be removed.
> > > > >
> > > > >
> > > > > Why is this?
> > > > >
> > > > >
> > > > > Also I would expect an event driven configuration option for
> > > > > HandleHttpRequest, since there is already the event of http request
> > > > occurs.
> > > > >
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Marc
> > > > >
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message