nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Kawamura <ijokaruma...@gmail.com>
Subject Re: MergeRecord
Date Fri, 13 Apr 2018 07:11:04 GMT
Hi,

I've tested InferAvroSchema and MergeRecord scenario.
As you described, records are not merged as expected.

The reason in my case is, InferAvroSchema generates schema text like this:
inferred.avro.schema
{ "type" : "record", "name" : "example", "doc" : "Schema generated by
Kite", "fields" : [ { "name" : "Key", "type" : "long", "doc" : "Type
inferred from '4'" }, { "name" : "Value", "type" : "string", "doc" :
"Type inferred from 'four'" } ] }

And, MergedRecord uses that schema text as groupId even if
'Correlation Attribute' is specified.
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/MergeRecord.java#L348

So, even if schema is the same, if actual values vary, merging group
id will be different.
If you can use SchemaRegistry, it should work as expected.

Thanks,
Koji

On Fri, Apr 13, 2018 at 2:45 PM, DEHAY Aurelien
<aurelien.dehay@faurecia.com> wrote:
>
> Hello.
>
> Thanks for the answer.
>
> The 20k is just the last test, I’ve tested with 100,1000, with an input queue of 10k,
and it doesn’t change anything.
>
> I will try to simplify the test case and to not use the inferred schema.
>
> Regards
>
>> Le 13 avr. 2018 à 04:50, Koji Kawamura <ijokarumawak@gmail.com> a écrit :
>>
>> Hello,
>>
>> I checked your template. Haven't run the flow since I don't have
>> sample input XML files.
>> However, when I looked at the MergeRecord processor configuration, I found that:
>> Minimum Number of Records = 20000
>> Max Bin Age = 10 sec
>>
>> By briefly looked at MergeRecord source code, it expires a bin that is
>> not complete after Max Bin Age.
>> Do you have 20,000 records to merge always within 10 sec window?
>> If not, I recommend to lower the minimum number of records.
>>
>> I haven't checked actual MergeRecord behavior so I may be wrong, but
>> worth to change the configuration.
>>
>> Hope this helps,
>> Koji
>>
>>
>> On Fri, Apr 13, 2018 at 12:26 AM, DEHAY Aurelien
>> <aurelien.dehay@faurecia.com> wrote:
>>> Hello.
>>>
>>> Please see the template attached. The problem we have is that, however any configuration
we can set in the mergerecord, we can't manage it to actually merge record.
>>>
>>> All the record are the same format, we put an inferschema not to have to write
it down ourselves. The only differences between schemas is then that the doc="" field are
different. Is it possible for it to prevent the merging?
>>>
>>> Thanks for any pointer or info.
>>>
>>>
>>> Aurélien DEHAY
>>>
>>>
>>>
>>> This electronic transmission (and any attachments thereto) is intended solely
for the use of the addressee(s). It may contain confidential or legally privileged information.
If you are not the intended recipient of this message, you must delete it immediately and
notify the sender. Any unauthorized use or disclosure of this message is strictly prohibited.
 Faurecia does not guarantee the integrity of this transmission and shall therefore never
be liable if the message is altered or falsified nor for any virus, interception or damage
to your system.
>
> This electronic transmission (and any attachments thereto) is intended solely for the
use of the addressee(s). It may contain confidential or legally privileged information. If
you are not the intended recipient of this message, you must delete it immediately and notify
the sender. Any unauthorized use or disclosure of this message is strictly prohibited.  Faurecia
does not guarantee the integrity of this transmission and shall therefore never be liable
if the message is altered or falsified nor for any virus, interception or damage to your system.
>

Mime
View raw message