nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Rosander <brosan...@apache.org>
Subject Re: Options for increasing performance?
Date Wed, 05 Apr 2017 19:39:18 GMT
This seems to have gotten lost in the chain, resending (please disregard if
you've already read/tried it):

Another thing to consider is whether the bottleneck is in NiFi or before it
gets there.  Is the source of data capable of making post requests more
quickly than that as configured? Is network latency or throughput a
limitation?  You might try  posting to another http server to see whether
the problem is within NiFi.

E.g. modify something like https://gist.github.com/bradmontgomery/2219997 to
log requests and see if the rate is similar even when no other processing
is done on the server side.

If you go with the python server, you may want to use the threading mixin
as well.

http://stackoverflow.com/questions/14088294/multithreaded-web-server-in-
python

Thanks,
Bryan

On Wed, Apr 5, 2017 at 3:19 PM, James McMahon <jsmcmahon3@gmail.com> wrote:

> We are not seeing 503s. We have tried setting up a second
> HandleHttpRequest, watching a different port, and "round robin`ing" to the
> two ports. We made a relatively low gain from abut 5 minutes for 100 files
> consistently to 4:40 for 100. I watch my workflow, and at no point does a
> large number of flowfiles queue up in any queue leading into or coming out
> of any processor.
>
> On Wed, Apr 5, 2017 at 2:44 PM, Bryan Rosander <brosander@apache.org>
> wrote:
>
>> It looks like HandleHttpRequest should be sending back a 503 if its
>> containerQueue fills up (default capacity of 50 requests that have been
>> accepted but not processed in an onTrigger()) [1].  Also, the default
>> thread pool the jetty server is using should be able to create up to 200
>> threads to accept connections and the handler is using an async context so
>> the in-flight flow files shouldn't be holding up new requests.
>>
>> If you're not seeing 503s it might be on the sender side of the
>> equation.  Is the sender doing posts concurrently or waiting on each to
>> complete before sending another?
>>
>> [1] https://github.com/apache/nifi/blob/rel/nifi-0.7.0/nifi-
>> nar-bundles/nifi-standard-bundle/nifi-standard-processors/
>> src/main/java/org/apache/nifi/processors/standard/
>> HandleHttpRequest.java#L395
>>
>> On Wed, Apr 5, 2017 at 2:27 PM, Joe Witt <joe.witt@gmail.com> wrote:
>>
>>> Much of this goodness can be found in the help->Users Guide.
>>> Adjusting run durection/scheduling factors:
>>>   https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#
>>> scheduling-tab
>>>
>>> These are the latest docs but I'm sure there is coverage in the older
>>> stuff.
>>>
>>> Thanks
>>>
>>> On Wed, Apr 5, 2017 at 2:23 PM, James McMahon <jsmcmahon3@gmail.com>
>>> wrote:
>>> > Yes sir! Sure am. And I know, because I have committed that very silly
>>> > mistake before. We are indeed seeing # responses = # requests  -Jim
>>> >
>>> > On Wed, Apr 5, 2017 at 2:13 PM, Bryan Rosander <brosander@apache.org>
>>> wrote:
>>> >>
>>> >> Hey James,
>>> >>
>>> >> Are you making sure that every route from HandleHttpRequest goes to
a
>>> >> HandleHttpResponse?  If not, the StandardHttpContextMap may be
>>> filling up
>>> >> with requests which would probably delay processing.
>>> >>
>>> >> Thanks,
>>> >> Bryan
>>> >>
>>> >> On Wed, Apr 5, 2017 at 2:07 PM, James McMahon <jsmcmahon3@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Thank you very much Matt. I have cranked my Concurrent Tasks config
>>> parm
>>> >>> on my ExecuteScripts up to 20, and judging by the empty queue
>>> feeding that
>>> >>> processor it is screaming through the flowfiles arriving at its
>>> doorstep.
>>> >>>
>>> >>> Can anyone comment on performance optimizations for
>>> HandleHttpRequest? In
>>> >>> your experiences, is HandleHttpRequest a bottleneck? I do notice
>>> that I
>>> >>> often have a count in the processor for "flowfile in process" within
>>> the
>>> >>> processor. Anywhere from 1 to 10 when it does show such a count.
>>> >>>
>>> >>> -Jim
>>> >>>
>>> >>> On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess <mattyb149@apache.org>
>>> >>> wrote:
>>> >>>>
>>> >>>> Jim,
>>> >>>>
>>> >>>> One quick thing you can try is to use GenerateFlowFile to send
to
>>> your
>>> >>>> ExecuteScript instead of HandleHttpRequest, you can configure
it to
>>> >>>> send whatever body with whatever attributes (such that you would
get
>>> >>>> from HandleHttpRequest) and send files at whatever rate the
>>> processor
>>> >>>> is scheduled. This might take ExecuteScript out of the bottleneck
>>> >>>> equation; if you are getting plenty of throughput without
>>> >>>> HandleHttpRequest then that's probably your bottleneck.
>>> >>>>
>>> >>>> I'm not sure offhand about optimizations for HandleHttpRequest,
>>> >>>> perhaps someone else will jump in :)
>>> >>>>
>>> >>>> Regards,
>>> >>>> Matt
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Apr 5, 2017 at 1:48 PM, James McMahon <jsmcmahon3@gmail.com
>>> >
>>> >>>> wrote:
>>> >>>> > I am receiving POSTs from a Pentaho process, delivering
files to
>>> my
>>> >>>> > NiFi
>>> >>>> > 0.7.x workflow HandleHttpRequest processor. That processor
hands
>>> the
>>> >>>> > flowfile off to an ExecuteScript processor that runs a
python
>>> script.
>>> >>>> > This
>>> >>>> > script is very, very simple: it takes an incoming JSO object
and
>>> loads
>>> >>>> > it
>>> >>>> > into a Python dictionary, and verifies the presence of
required
>>> fields
>>> >>>> > using
>>> >>>> > simple has_key checks on the dictionary. There are only
eight
>>> fields
>>> >>>> > in the
>>> >>>> > incoming JSON object.
>>> >>>> >
>>> >>>> > The throughput for these two processes is not exceeding
100-150
>>> files
>>> >>>> > in
>>> >>>> > five minutes. It seems very slow in light of the minimal
>>> processing
>>> >>>> > going on
>>> >>>> > in these two steps.
>>> >>>> >
>>> >>>> > I notice that there are configuration operations seemingly
>>> related to
>>> >>>> > optimizing performance. "Concurrent tasks", for example,
 is only
>>> set
>>> >>>> > by
>>> >>>> > default to 1 for each processor.
>>> >>>> >
>>> >>>> > What performance optimizations at the processor level do
users
>>> >>>> > recommend? Is
>>> >>>> > it advisable to crank up the concurrent tasks for a processor,
>>> and is
>>> >>>> > there
>>> >>>> > an optimal performance point beyond which you should not
crank up
>>> that
>>> >>>> > value? Are there trade-offs?
>>> >>>> >
>>> >>>> > I am particularly interested in optimizations for
>>> HandleHttpRequest
>>> >>>> > and
>>> >>>> > ExecuteScript processors.
>>> >>>> >
>>> >>>> > Thanks in advance for your thoughts.
>>> >>>> >
>>> >>>> > cheers,
>>> >>>> >
>>> >>>> > Jim
>>> >>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Mime
View raw message