nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Sequeiros <helloj...@gmail.com>
Subject Re: Options for increasing performance?
Date Wed, 05 Apr 2017 20:01:11 GMT
If you have wireshark you could use:

 tshark -f "port 8446 or port 9448"


On Wed, Apr 5, 2017 at 3:45 PM James McMahon <jsmcmahon3@gmail.com> wrote:

> Thank you Bryan. I will explore these things. I suspect we are not
> receiving from the source optimally. Reason I say that is this: I am doing
> manual refreshes on my flow page every 3 to 4 seconds. Frequently I go
> through 3 or 4 refreshes, and not figures change in my queues nor in my
> processors. Seems like my workflow is just sitting there waiting for new
> arrivals.
> I am using ports 8446 and 9448 (I have two HandleHttpRequest processors
> now). Anyone know fo a few commands I can use to monitor arrivals at my
> port of incoming POSTs? Is this something I can monitor using the FF
> developer features? -Jim
>
> On Wed, Apr 5, 2017 at 3:39 PM, Bryan Rosander <brosander@apache.org>
> wrote:
>
> This seems to have gotten lost in the chain, resending (please disregard
> if you've already read/tried it):
>
> Another thing to consider is whether the bottleneck is in NiFi or before
> it gets there.  Is the source of data capable of making post requests more
> quickly than that as configured? Is network latency or throughput a
> limitation?  You might try  posting to another http server to see whether
> the problem is within NiFi.
>
> E.g. modify something like https://gist.github.com/bradmontgomery/2219997 to
> log requests and see if the rate is similar even when no other processing
> is done on the server side.
>
> If you go with the python server, you may want to use the threading mixin
> as well.
>
>
> http://stackoverflow.com/questions/14088294/multithreaded-web-server-in-python
>
> Thanks,
> Bryan
>
> On Wed, Apr 5, 2017 at 3:19 PM, James McMahon <jsmcmahon3@gmail.com>
> wrote:
>
> We are not seeing 503s. We have tried setting up a second
> HandleHttpRequest, watching a different port, and "round robin`ing" to the
> two ports. We made a relatively low gain from abut 5 minutes for 100 files
> consistently to 4:40 for 100. I watch my workflow, and at no point does a
> large number of flowfiles queue up in any queue leading into or coming out
> of any processor.
>
> On Wed, Apr 5, 2017 at 2:44 PM, Bryan Rosander <brosander@apache.org>
> wrote:
>
> It looks like HandleHttpRequest should be sending back a 503 if its
> containerQueue fills up (default capacity of 50 requests that have been
> accepted but not processed in an onTrigger()) [1].  Also, the default
> thread pool the jetty server is using should be able to create up to 200
> threads to accept connections and the handler is using an async context so
> the in-flight flow files shouldn't be holding up new requests.
>
> If you're not seeing 503s it might be on the sender side of the equation.
> Is the sender doing posts concurrently or waiting on each to complete
> before sending another?
>
> [1]
> https://github.com/apache/nifi/blob/rel/nifi-0.7.0/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpRequest.java#L395
>
> On Wed, Apr 5, 2017 at 2:27 PM, Joe Witt <joe.witt@gmail.com> wrote:
>
> Much of this goodness can be found in the help->Users Guide.
> Adjusting run durection/scheduling factors:
>
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-tab
>
> These are the latest docs but I'm sure there is coverage in the older
> stuff.
>
> Thanks
>
> On Wed, Apr 5, 2017 at 2:23 PM, James McMahon <jsmcmahon3@gmail.com>
> wrote:
> > Yes sir! Sure am. And I know, because I have committed that very silly
> > mistake before. We are indeed seeing # responses = # requests  -Jim
> >
> > On Wed, Apr 5, 2017 at 2:13 PM, Bryan Rosander <brosander@apache.org>
> wrote:
> >>
> >> Hey James,
> >>
> >> Are you making sure that every route from HandleHttpRequest goes to a
> >> HandleHttpResponse?  If not, the StandardHttpContextMap may be filling
> up
> >> with requests which would probably delay processing.
> >>
> >> Thanks,
> >> Bryan
> >>
> >> On Wed, Apr 5, 2017 at 2:07 PM, James McMahon <jsmcmahon3@gmail.com>
> >> wrote:
> >>>
> >>> Thank you very much Matt. I have cranked my Concurrent Tasks config
> parm
> >>> on my ExecuteScripts up to 20, and judging by the empty queue feeding
> that
> >>> processor it is screaming through the flowfiles arriving at its
> doorstep.
> >>>
> >>> Can anyone comment on performance optimizations for HandleHttpRequest?
> In
> >>> your experiences, is HandleHttpRequest a bottleneck? I do notice that I
> >>> often have a count in the processor for "flowfile in process" within
> the
> >>> processor. Anywhere from 1 to 10 when it does show such a count.
> >>>
> >>> -Jim
> >>>
> >>> On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess <mattyb149@apache.org>
> >>> wrote:
> >>>>
> >>>> Jim,
> >>>>
> >>>> One quick thing you can try is to use GenerateFlowFile to send to your
> >>>> ExecuteScript instead of HandleHttpRequest, you can configure it to
> >>>> send whatever body with whatever attributes (such that you would get
> >>>> from HandleHttpRequest) and send files at whatever rate the processor
> >>>> is scheduled. This might take ExecuteScript out of the bottleneck
> >>>> equation; if you are getting plenty of throughput without
> >>>> HandleHttpRequest then that's probably your bottleneck.
> >>>>
> >>>> I'm not sure offhand about optimizations for HandleHttpRequest,
> >>>> perhaps someone else will jump in :)
> >>>>
> >>>> Regards,
> >>>> Matt
> >>>>
> >>>>
> >>>> On Wed, Apr 5, 2017 at 1:48 PM, James McMahon <jsmcmahon3@gmail.com>
> >>>> wrote:
> >>>> > I am receiving POSTs from a Pentaho process, delivering files to
my
> >>>> > NiFi
> >>>> > 0.7.x workflow HandleHttpRequest processor. That processor hands
the
> >>>> > flowfile off to an ExecuteScript processor that runs a python
> script.
> >>>> > This
> >>>> > script is very, very simple: it takes an incoming JSO object and
> loads
> >>>> > it
> >>>> > into a Python dictionary, and verifies the presence of required
> fields
> >>>> > using
> >>>> > simple has_key checks on the dictionary. There are only eight fields
> >>>> > in the
> >>>> > incoming JSON object.
> >>>> >
> >>>> > The throughput for these two processes is not exceeding 100-150
> files
> >>>> > in
> >>>> > five minutes. It seems very slow in light of the minimal processing
> >>>> > going on
> >>>> > in these two steps.
> >>>> >
> >>>> > I notice that there are configuration operations seemingly related
> to
> >>>> > optimizing performance. "Concurrent tasks", for example,  is only
> set
> >>>> > by
> >>>> > default to 1 for each processor.
> >>>> >
> >>>> > What performance optimizations at the processor level do users
> >>>> > recommend? Is
> >>>> > it advisable to crank up the concurrent tasks for a processor,
and
> is
> >>>> > there
> >>>> > an optimal performance point beyond which you should not crank
up
> that
> >>>> > value? Are there trade-offs?
> >>>> >
> >>>> > I am particularly interested in optimizations for HandleHttpRequest
> >>>> > and
> >>>> > ExecuteScript processors.
> >>>> >
> >>>> > Thanks in advance for your thoughts.
> >>>> >
> >>>> > cheers,
> >>>> >
> >>>> > Jim
> >>>
> >>>
> >>
> >
>
>
>
>
>
>

Mime
View raw message