nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <marka...@hotmail.com>
Subject Re: Merging the unique attributes of 2 flowfiles
Date Mon, 01 Jun 2020 14:32:06 GMT
David,

I suspect you need to change the "Maximum number of Bins” property. The default, I believe,
is 1 bin. Or maybe 5 or 10. Something small. This works fine if you’re not using the correlation
attribute.

When a FlowFile comes into the Processor, the Processor has to determine which bin to put
the FlowFile in. If using the Correlation Attribute, it determines that by looking at the
value of the attribute. So when FlowFile 1 comes in with an attribute value of Foo, it goes
to Bin 1. FlowFile 2 comes in with and attribute value of Bar and it goes to Bin 2. FlowFile
3 comes in and has an attribute value of Baz so it goes to Bin 3.

So let’s say that we’ve filled up all of the bins. And another FlowFile comes in. It must
go to one of the existing bins, or be put into a new bin. If its attribute value matches one
of the bins, it’ll be merged together with the other FlowFiles in that bin. But if it doesn’t
match one of the bins, it needs its own, new bin. Since all of the bins have now been used
up, it must evict one of the existing bins prematurely and fail it.

So at a low volume you’re likely not seeing all bins used. But when you increase the volume,
you’re filling all of the bins and failing the merge. So you may want to set it to at least
30, given that you’re indicating that you’ll have up to 30 logs per transaction - or perhaps
a bit more if you want to leave a little extra room for that to change.

Thanks
-Mark





> On Jun 1, 2020, at 8:33 AM, DAVID SMITH <davidrsmith@btinternet.com.INVALID> wrote:
> 
> Hi
> I have a group of log files coming in via http listener, up to 30 logs  per transaction,
of which I only need the values that are in 2 of those log files per transaction. After using
some RouteOnContents I end up with the two log flowfiles I want.
> In my current flow I am using a MergeContent processor to try and merge the two required
flowfiles on a common ident attribute value  which I have extracted from each log files earlier,
I have also extracted some other attributes from the flowfiles at this point, and as everything
I am interested in these attributes I don't mind what happens with the content of the flowfiiles.
When I step through the flow all is fine and works as I expect, however when I run it at pace
and log files are coming in for multiple transactions at the same time the merge fails on
most occasions. 
> 
> My mergecontent settings are:Merge Strategy                        Bin Packing AlgorithmMerge
Format                          Binary ConcatenationAttribute Strategy                   
Keep all Unique atttributesCorrelation Attribute Name         ${import.ident}Metadata Strategy
                   Ignore MeatdataMinimum No Of Entries            2Maximum No Of Enteries
           2Max bin age                             1 minutes
> All the other properties are at default.
> Have I not set something correctly or is there a simpler way of merging the attributes
from two flowfiles onto one flowfile?
> Many thanksDave

Mime
View raw message