nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 尹文才 <batman...@gmail.com>
Subject Re: Is there a configuration to limit the size of nifi's flowfile repository
Date Fri, 27 Apr 2018 01:08:33 GMT
Hi guys, I checked my CentOS 7's soft and hard file handle limit, they're
1024 and 4096, so my case belongs to the third scenario that Mark had
mentioned above.
Pierre, we use NIFI as ETL tool to extract data from sql server for data
analysis and according to my knowledge the total flowfiles' count inside
the flow could be around 200 to 300 which
could take 9 to 10 simple string attributes.

Regards,
Ben

2018-04-26 21:25 GMT+08:00 Mark Payne <markap14@hotmail.com>:

> Ben,
>
> There are three things that I've seen cause really massive FlowFile
> Repositories:
>
> 1) OutOfMemoryError occurs that causes NiFi to stop working properly.
> 2) The "nifi.flowfile.repository.checkpoint.interval" property is set
> really long (2 mins is the default).
> 3) By far, the most common, is that the system runs out of available file
> handles. You can check how many
> file handles are available by running "ulimit -Hn" and "ulimit -Sn". We
> recommend at least 50,000 be set,
> but the default on most linux-based operating systems is much smaller,
> like 4,096. The Admin Guide [1]
> will guide you through increasing this value, if this is the problem.
>
> Thanks
> -Mark
>
> [1] http://nifi.apache.org/docs/nifi-docs/html/administration-guide.html
>
>
>
> On Apr 26, 2018, at 5:26 AM, 尹文才 <batman713@gmail.com<mailto:ba
> tman713@gmail.com>> wrote:
>
> hi guys, thanks for all your answers, I actually have seen that the
> flowfile repo in one of our openstack centos 7 machine grew up to abour 30
> GB, which as a result used up all the disk space allocated for the virtual
> machine and the flow inside
> NIFI couldn't proceed and many errors started to appear such as fail to
> checkpoint, etc.We used NIFI now as a ETL tool to extract some data from
> sql server for data analysis.
> I actually have no idea why the flowfile repo would grow up like this, in
> my idea it is only used to place all flowfile attributes. It would be great
> if there're some options to limit the flowfile repo size.
>
> Thanks.
> Regard,
> Ben
>
> 2018-04-26 2:08 GMT+08:00 Brandon DeVries <brd@jhu.edu<mailto:brd@jhu.edu
> >>:
>
> All,
>
> This is something I think we shouldn't dismiss so easily.  While the
> FlowFile repo is lighter than the content repo, allowing it to grow too
> large can cause major problems.
>
> Specifically, an "overgrown" FlowFile repo may prevent a NiFi instance from
> coming back up after a restart due to the way in which records are held in
> memory.  If there is more memory available to give to the JVM, this can
> sometimes be worked around... but if there isn't you may just be out of
> luck.  For that matter, allowing the FlowFile repo to grow so large that it
> consumes all the heap isn't going to be good for system health in general
> (OOM is probably never where you want to be...).
>
> To Pierre's point "you don't want to limit that repository in size since it
> would prevent the workflows to create new flow files"... that's exactly why
> I would want to limit the size of the repo.  You do then get into questions
> of how exactly to do this.  For example, you may not want to simply block
> all transactions that create a FlowFile, because it may remove even more
> (e.g. MergeContent).  Additionally, you have to be concerned about
> deadlocks (e.g. a "Wait" that hangs forever because its "Notify" is being
> starved).  Or, perhaps that's all you can do... freeze everything at some
> threshold prior to actual damage being done, and alert operators that
> manual intervention is necessary (e.g. bring up the graph with
> autoResume=false, and bleed off data in a controlled fashion).
>
> In summary, I believe this is a problem.  Even if it doesn't come up often,
> when it does it is significant.  While the solution likely isn't simple,
> it's worth putting some thought towards.
>
> Brandon
>
> On Wed, Apr 25, 2018 at 9:43 AM Sivaprasanna <sivaprasanna246@gmail.com<
> mailto:sivaprasanna246@gmail.com>>
> wrote:
>
> No, he actually had mentioned “like content repository”. The answer is,
> there aren’t any properties that support this, AFAIK. Pierre’s response
> pretty much sums up why there aren’t any properties.
>
> Thanks,
> Sivaprasanna
>
> On Wed, 25 Apr 2018 at 7:10 PM, Mike Thomsen <mikerthomsen@gmail.com<
> mailto:mikerthomsen@gmail.com>>
> wrote:
>
> I have a feeling that what Ben meant was how to limit the content
> repository size.
>
> On Wed, Apr 25, 2018 at 8:26 AM Pierre Villard <
> pierre.villard.fr@gmail.com<mailto:pierre.villard.fr@gmail.com>>
> wrote:
>
> Hi Ben,
>
> Since the flow file repository contains the information of the flow
> files
> currently being processed by NiFi, you don't want to limit that
> repository
> in size since it would prevent the workflows to create new flow
> files.
>
> Besides this repository is very lightweight, I'm not sure it'd need
> to
> be
> limited in size.
> Do you have a specific use case in mind?
>
> Pierre
>
>
> 2018-04-25 9:15 GMT+02:00 尹文才 <batman713@gmail.com<mailto:ba
> tman713@gmail.com>>:
>
> Hi guys, I checked NIFI's system administrator guide trying to
> find a
> configuration item so that the size of the flowfile repository
> could
> be
> limited similar to the other repositories(e.g. content repository),
> but I
> didn't find such configuration items, is there currently any
> configuration
> to limit the flowfile repository's size? thanks.
>
> Regards,
> Ben
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message