nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Percivall <>
Subject Re: Guidance for NiFi output streaming
Date Thu, 26 May 2016 14:02:15 GMT
Hello Stephane,

Just to be sure I have your use-case correct, you are ingesting a continuous stream of lat/lon
information for various devices. Every 1 second you want to take the information from the
previous second and write out just the most recent lat/lon of each device. 

An important question, do you only want this file to include devices that have been in the
last second or do you want to write out the last known lat/lon of every device seen? That
is an important question because it is the difference between having to store state or not.
If you need the last known of all devices seen, and thus need to store state, the use-case
gets much trickier.

Another question, what order of magnitude of data are you planning on ingesting? If it's relatively
low and you're use-case does not need to store state, you could create a processor that would
analyze all FlowFiles currently on the queue to grab the latest lat/lon for each device and
then emit a FlowFile with a content of the file you want to write. Set it to trigger every
1 second and it would batch up the latest lat/lon for each device for the previous second.
This would start to cause problems when it tries to batch up a large quantity of FlowFiles,
similar to MergeContent.

- - - - - - 
Joseph Percivall

On Thursday, May 26, 2016 1:06 AM, St├ęphane Maarek <> wrote:

I have tried a ControlRate but it doesn't work because it seems to stop processing once the
threshold of 1 is reached, even though I set a grouping property (I know there are two different
values for my group in my queue). Any clue?

On Thu, May 26, 2016 at 2:30 PM St├ęphane Maarek <> wrote:

>I need to output some data streaming from multiple devices directly into a map (mapboxjs).

>Basically, every 1 second, I want to only write the last data point for each device to
a json file. My problem resides in "how to pick the latest data point by device"
>My incoming flow file has three attributes: device_id, lat, lon. 
>at some point they may queue up like this:
>1, (-37,20)
>1, (-37.1,20.1)
>2, (-40,30)
>2, (-40.1, 29.9)
>At the end, I wish to only have the latest point for each device ID
>1, (-37.1,20.1)
>2, (-40.1, 29.9)
>How can I design a processor for this?

View raw message