nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Thomsen <mikerthom...@gmail.com>
Subject Re: Benchmark and Timer driven vs. Event driven
Date Sun, 23 Feb 2020 17:05:13 GMT
Not with hard numbers, but when you look at job reqs and proposals it's
***everywhere***. I also can't remember the last time I saw a data
engineering demo or discussion where NiFi or StreamSets wasn't the
foundation.

On Sun, Feb 23, 2020 at 4:21 PM Martin Ebert <martin.irgang@gmx.de> wrote:

> "NiFi is now emerging as the de facto standard for data engineering in
> the government market in the US in part because properly hardening it is
> closer to something a well-motivated intern can do than requiring a
> "seasoned professional.""
> Is there any way to prove this? Sounds interesting.
>
>
> Mike Thomsen <mikerthomsen@gmail.com> schrieb am So., 23. Feb. 2020,
> 17:08:
>
> > > I just made a few benchmarks with NiFi to compare it to another
> solution.
> >
> > Raw performance is only one consideration when choosing an ETL or data
> > orchestration tool. NiFi has some very critical competitive advantages
> such
> > as how aggressively it protects the contents of the data flow from
> external
> > failure (ex someone killing the JVM doesn't corrupt hours of work) and
> how
> > easy it is to very deeply harden** it on the security side of things.
> Plus,
> > you have the fact that unlike many tools in this space, it's very agile
> in
> > being able to stop a job at any time and inspect the inputs and outputs.
> >
> > ** NiFi is now emerging as the de facto standard for data engineering in
> > the government market in the US in part because properly hardening it is
> > closer to something a well-motivated intern can do than requiring a
> > "seasoned professional."
> >
> > On Sun, Feb 23, 2020 at 3:36 PM Marc Pellmann <pellmann@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > >
> > > I am interested in some insight to timer driven vs. event driven and
> the
> > > future plans with event driven.
> > >
> > >
> > > I just made a few benchmarks with NiFi to compare it to another
> solution.
> > >
> > >
> > > The flows primarily consist of synchronous Web Service/REST like calls.
> > So
> > > I use HandleHttpRequest/HandleHttpResponse. In the concrete example I
> > just
> > > have two processors in between - a ReplaceText and a TransformXml.
> > >
> > >
> > > From the client side I use JMeter to generate the load (just POST calls
> > > with a few bytes content).
> > >
> > >
> > > First I tested this with standard values, which means timer driven
> > > scheduling strategy and 1 task.
> > >
> > >
> > > The numbers from this tests where not very impressive, so I played with
> > the
> > > configuration and setted the scheduling strategy to event driven (with
> > task
> > > value 0 and maximum event driven thread count of 1). This could be only
> > > done for the two processors between and not for the
> > > HandleHttpRequest/HandleHttpResponse since they do not allow such
> > > configuration.
> > >
> > >
> > > This increased the throughput by the factor 6.
> > >
> > >
> > > I also tested to increase the throughput with some other
> configurations,
> > > such as more tasks or different run durations, but this did not changed
> > the
> > > values significantly.
> > >
> > >
> > > So a least for this type of scenario, the event driven configuration is
> > > much better. But on the other side it is still experimental and
> according
> > > to some posts it is not seen as a good option and sounds more like it
> is
> > > something that might be removed.
> > >
> > >
> > > Why is this?
> > >
> > >
> > > Also I would expect an event driven configuration option for
> > > HandleHttpRequest, since there is already the event of http request
> > occurs.
> > >
> > >
> > > Best regards,
> > >
> > > Marc
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message