nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Payne (Jira)" <>
Subject [jira] [Commented] (NIFI-7874) S3List processor in v1.12.1 uses lots of CPU power and RAM
Date Fri, 02 Oct 2020 14:45:00 GMT


Mark Payne commented on NIFI-7874:

Thanks for reporting [~dreseldo]! Definitely a regression that was introduced in 1.12.0. I
have a PR up.

> S3List processor in v1.12.1 uses lots of CPU power and RAM
> ----------------------------------------------------------
>                 Key: NIFI-7874
>                 URL:
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.12.1
>         Environment: Centos 7, Amazon Cloud, 8 CPU cores, 64 GB RAM
>            Reporter: Dominik Dresel
>            Priority: Major
>             Fix For: 1.11.4
>          Time Spent: 10m
>  Remaining Estimate: 0h
> We are using the S3List processor to collect our log data from S3 and process them further.
In Nifi version 1.11.4 the plugin reads a log file from S3, creates a flow file out of it,
routes it to success and repeats its loop from the beginning. This is fast and does not need
a lot of resources. We can operate Nifi at the default 512 MB RAM with 8 CPU cores which are
utilized roughly at 50%.
> With the new version of the S3List processor (v1.12.1) the flow files seem to get cached
in memory while the files on S3 are enumerated. Because of this, we set the Xmx and Xms parameters
in bootstrap.conf to 4GB which does not suffice (we get an exception from AWS at some time).
While the collection of the S3 entries is in progress, all 8 core of the CPUs are utilized
at 100% and the RAM gets eaten up. This is especially bad because Nifi then does not have
the resources to contact its external zookeeper and gets kicked out of the cluster. Also it
is not possible to use the web UI anymore.
> This behavior won´t show up if you just have a few objects in S3 because they can easily
be cached in memory but we have millions of entries in our S3 which will eat up the RAM of
the machine.
> Maybe it would be a good thing to have an additional parameter for the processor which
sets after how many created flow files they have to be routed to success.
> If you need any more logfiles I would be happy to provide them!
> BTW: Nifi is great :) Very easy to use and (normally) very economical about resources.

This message was sent by Atlassian Jira

View raw message