nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kumiko Yada <Kumiko.Y...@ds-iq.com>
Subject RE: Best way to process the processor requests in batch
Date Fri, 27 May 2016 17:02:25 GMT
Joe,

Thank you for your inputs.

I'd like to avoid creating the multi-threads.  Would it possible to loop through a ProcessSession
once it's committed?   For example, the total of 1000 requests, and break down 100 requests
per batch.  Create/transfer a flowfile per request, then once 100 requests are processed,
commit it and then loop through again.  Would it better that transfer a flow once at time,
but transfer it in batch?

Thanks
Kumiko

-----Original Message-----
From: Joe Witt [mailto:joe.witt@gmail.com] 
Sent: Thursday, May 26, 2016 7:17 PM
To: dev@nifi.apache.org
Subject: Re: Best way to process the processor requests in batch

Kumiko

A couple of quick thoughts to share.  You can absolutely code your processor to operate in
batches and you can of course multi-thread the processor.  The general unit of work concept
Apache NiFi supports is called a ProcessSession and you can operate on as many flow files
as
you need in that session and then commit it as one batch.   NiFi will
automatically track/record a lot of very nice information at the process session level.  In
addition NiFi will capture provenance information which itself is useful for understand specific
items that went through that flow and their latencies and such.  Beyond these options there
is also a concept of counters which you can use to capture, generally for development purposes,
interesting things you'd like to observe over time. You'll also want to get a good handle
on what performance you should expect interacting with the web service independent of NiFi
so you can get a good baseline to work from.

The quota question is also one where you have choices and design decisions to make.  You can
bake this quota handling logic into your processor itself or you could also possibly wire
existing or some new processor in that specifically handles the quote/grouping logic you need
and it would have relationships such as 'within quota' and 'exceeds quota'.

I apologize for not giving a more precise response.  There are many ways to approach this
and the best trade offs will depend on finer details.  As you advance with this please feel
free to ask more questions.  If you find things you wish were available and you think should
exist in NiFi we'd love to have your contribution in any form (ideas, code, JIRAs, etc..).

Thanks
Joe

On Thu, May 26, 2016 at 9:08 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com> wrote:
> Hello,
>
> We implemented the custom process that are similar to the InvokeHTTP that the part of
URL can be replaced with the Context Data List, then write the weather to the flowfile.  For
example, URL to get the weather feed have to include the zip code in URL, and the ZIP code
is {0} in the URL and replaced the zip code from the Context Data List property.
>
> URL
> http://example{0}/weather<http://example%7b0%7d/weather>
>
> Context Data List:
> 00000
> 11111
> 22222
>
> Processor with make the following requests:
> http://example{0}/weather<http://example%7b0%7d/weather>
>
> http://example00000/weather
> http://example11111/weather
> http://example22222/weather
>
> This processor is processed in one request at a time and have a perf issue.  I'd like
to modify to process in batches.  What are the best way to process in batches?  And also,
would the Nifi keep track how many requests the processor is processed?  If so, how the Nifi
keep track this and how long the Nifi keep track of data?  I'd like to add the quota priorities
in this processor to keep track of quota.  For example, if the weather feeds can be requested
only 100 requests a day, I don't want to processor to executed once the quota is reached.
>
> Thanks
> Kumiko
Mime
View raw message