nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mika Borner <n...@my2ndhead.com>
Subject Re: Merging Records
Date Mon, 12 Jun 2017 20:34:50 GMT
Yes, it worked!

Thanks!

Mika>


On 06/12/2017 10:02 PM, Bryan Bende wrote:
> Mika,
>
> Are you receiving the log messages using the ListenTCP processor?
>
> If so, just wanted to mention that there is a property "Max Batch
> Size" that defaults to 1 and will control how many logical TCP
> messages can be written to a single flow file.
>
> If you increase that to say 1000, then you can send a flow file with
> 1000 log messages to the next record-based processor with the
> GrokReader.
>
> -Bryan
>
>
> On Mon, Jun 12, 2017 at 3:51 PM, Mark Payne <markap14@hotmail.com> wrote:
>> Mika,
>>
>> Understood. The JIRA for this is NIFI-4060 [1]. MergeContent is likely the
>> best option for the short-term,
>> merging with a demarcator of \n (you can press Shift + Enter/Return to
>> insert a new-line in the UI), if that
>> works for your format.
>>
>> Thanks
>> -Mark
>>
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-4060
>>
>>
>> On Jun 12, 2017, at 3:36 PM, Mika Borner <nifi@my2ndhead.com> wrote:
>>
>> Hi Mark
>>
>> Yes, this makes sense.
>>
>> In my case. I'm receiving single log events from a tcp input which I would
>> like to process further with record processors. This is  probably an edge
>> case where a record merger would make sense to make the post-processing more
>> efficient.
>>
>> Good to hear it's already on the radar :-)
>>
>> Mika>
>>
>>
>>
>> On 06/12/2017 09:23 PM, Mark Payne wrote:
>>
>> Hi Mika,
>>
>> You're correct that there is not yet a MergeRecord processor. It is on my
>> personal radar,
>> but I've not yet gotten to it. One of the main reasons that I've not
>> prioritized this yet is that
>> typically in this record-oriented paradigm, you'll see data coming in, in
>> groups and being
>> processed in groups. MergeContent largely has been useful in cases where we
>> split data
>> apart (using processors like SplitText, for example), and then merge it back
>> together later.
>> I don't see this as being quite as prominent when using record readers and
>> writers, as the
>> readers are designed to handle streams of data instead of individual records
>> as FlowFiles.
>>
>> That being said, there are certainly cases where MergeRecord still makes
>> sense. For example,
>> when you're ingesting small payloads or want to batch up to send to
>> something like HDFS, which
>> prefers larger files, etc. So I'll hopefully have a chance to start working
>> on that this week or next.
>>
>> In the mean time, the best path forward for you may be to use MergeContent
>> to concatenate a bunch
>> of data before the processor that is using the Grok Reader. Or, if you are
>> splitting the data up
>> into individual records yourself, I would recommend not splitting them up at
>> all.
>>
>> Does this make sense?
>>
>> Thanks
>> -Mark
>>
>>
>> On Jun 12, 2017, at 3:12 PM, Mika Borner <nifi@my2ndhead.com> wrote:
>>
>> Hi,
>>
>> what is the best way to merge records? I'm using a GrokReader, that spits
>> out single json records. For efficiency I would like to merge a few hundred
>> records into one flowfile. It seems there's no MergeRecord processor yet...
>>
>> Thanks!
>>
>> Mika>
>>
>>
>>


Mime
View raw message