nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@apache.org>
Subject Re: SplitJson:GC Overhead Limit Exceeded
Date Thu, 17 Nov 2016 16:51:35 GMT
If we consider streaming for SplitJson (or a new version of it), we
wouldn't be able to support the "micro-batch" functionality as is in
SplitJson today (like the fragment.count attribute, for example).
Might not be a concern, or might warrant a new processor
(SplitJsonStreaming, e.g.) .

Regards,
Matt

On Thu, Nov 17, 2016 at 11:36 AM, Aldrin Piri <aldrinpiri@gmail.com> wrote:
> The backing library of the Json processors does indeed require loading the
> entire doc into memory. We should make sure this consideration is documented
> if not already.
>
> Could be an interesting idea to not tie SplitJson to this library given that
> it might not need all the functionalities of JsonPath and would likely be a
> good candidate for streaming.
> On Thu, Nov 17, 2016 at 11:23 Mark Payne <markap14@hotmail.com> wrote:
>>
>> Hi Mike,
>>
>> Certainly, I would recommend trying to change the max heap to say 2 GB and
>> see if that gives you what you need.
>> Looking at the code, it does look like this Processor may not be the most
>> efficient in how it is parsing the JSON.
>> There are libraries, for example, that provide a "Streaming JSON"
>> interface, but this Processor loads the entire JSON
>> into heap and then creates an Object Model from it.
>>
>> Also, what do you have set for the Max Concurrent Tasks? If you have
>> multiple threads simultaneously running, you could
>> have each one using up quite a lot of heap.
>>
>> Thanks
>> -Mark
>>
>>
>> On Nov 17, 2016, at 10:54 AM, Mike Harding <mikeyharding@gmail.com> wrote:
>>
>> ..just for info in bootstrap.conf my heap size is as follows:
>>
>> java.arg.2=-Xms512m
>>
>> java.arg.3=-Xmx512m
>>
>> Would it be a simple case of increasing this? The size of the flowfile
>> json array is 35MB.
>>
>> Mike
>>
>>
>>
>> On 17 November 2016 at 15:47, Mike Harding <mikeyharding@gmail.com> wrote:
>>>
>>> Hi All,
>>>
>>> I have a flowfile containing a JSON array with 30k objects that I am
>>> trying to split into separate flowfiles for down stream processing.
>>>
>>> The problem is the processor reports a GC Overhead Limit Exceeded warning
>>> and administratively yields.
>>>
>>> Is there anyway of setting up a back pressure option or some changes to
>>> the nifi config to best address this.
>>>
>>> Thanks,
>>> Mike
>>
>>
>>
>

Mime
View raw message