nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Pellmann <pellm...@gmail.com>
Subject Re: Benchmark and Timer driven vs. Event driven
Date Mon, 24 Feb 2020 12:42:50 GMT
Hi Mike,

thank you for your reply!

I think I need to clarify my intention. I think NiFi is a very good product
and I am
also aware of the positive aspects beside performance, that you mentioned!

It is just the difference between event vs. timer driven scheduling und the
future
plans about this. Or to rephrase this - the future proof aspect of flows,
created
with event driven scheduling.

Best regards,
Marc

On Sun, Feb 23, 2020 at 5:08 PM Mike Thomsen <mikerthomsen@gmail.com> wrote:

> > I just made a few benchmarks with NiFi to compare it to another solution.
>
> Raw performance is only one consideration when choosing an ETL or data
> orchestration tool. NiFi has some very critical competitive advantages such
> as how aggressively it protects the contents of the data flow from external
> failure (ex someone killing the JVM doesn't corrupt hours of work) and how
> easy it is to very deeply harden** it on the security side of things. Plus,
> you have the fact that unlike many tools in this space, it's very agile in
> being able to stop a job at any time and inspect the inputs and outputs.
>
> ** NiFi is now emerging as the de facto standard for data engineering in
> the government market in the US in part because properly hardening it is
> closer to something a well-motivated intern can do than requiring a
> "seasoned professional."
>
> On Sun, Feb 23, 2020 at 3:36 PM Marc Pellmann <pellmann@gmail.com> wrote:
>
> > Hi,
> >
> >
> > I am interested in some insight to timer driven vs. event driven and the
> > future plans with event driven.
> >
> >
> > I just made a few benchmarks with NiFi to compare it to another solution.
> >
> >
> > The flows primarily consist of synchronous Web Service/REST like calls.
> So
> > I use HandleHttpRequest/HandleHttpResponse. In the concrete example I
> just
> > have two processors in between - a ReplaceText and a TransformXml.
> >
> >
> > From the client side I use JMeter to generate the load (just POST calls
> > with a few bytes content).
> >
> >
> > First I tested this with standard values, which means timer driven
> > scheduling strategy and 1 task.
> >
> >
> > The numbers from this tests where not very impressive, so I played with
> the
> > configuration and setted the scheduling strategy to event driven (with
> task
> > value 0 and maximum event driven thread count of 1). This could be only
> > done for the two processors between and not for the
> > HandleHttpRequest/HandleHttpResponse since they do not allow such
> > configuration.
> >
> >
> > This increased the throughput by the factor 6.
> >
> >
> > I also tested to increase the throughput with some other configurations,
> > such as more tasks or different run durations, but this did not changed
> the
> > values significantly.
> >
> >
> > So a least for this type of scenario, the event driven configuration is
> > much better. But on the other side it is still experimental and according
> > to some posts it is not seen as a good option and sounds more like it is
> > something that might be removed.
> >
> >
> > Why is this?
> >
> >
> > Also I would expect an event driven configuration option for
> > HandleHttpRequest, since there is already the event of http request
> occurs.
> >
> >
> > Best regards,
> >
> > Marc
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message