nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olav Jordens <Olav.Jord...@2degreesmobile.co.nz>
Subject RE: FW: Content repository filling up
Date Tue, 25 Apr 2017 02:10:58 GMT
Hi Joe,

Thanks so much for your quick response. My content repository  has about 400GB storage, and
large files are about 1 – 2 GB in size. There are many small flow files which are generated
containing json or SQL content, generally less than 1 KB each. These smaller files are used
to route the flowfile correctly. They are generated quite quickly and sometimes a few thousand
are queued while the larger files are processed.
The surprising thing for me is that even with nifi.content.claim.max.flow.files=1
I still see multiple pieces of content within a single file in the content repository. Some
are small bits os sql / json, and others are huge 1GB text files.

Thanks,
Olav

From: Joe Witt [mailto:joe.witt@gmail.com]
Sent: Tuesday, 25 April 2017 1:35 p.m.
To: users@nifi.apache.org
Subject: Re: FW: Content repository filling up

Olav,

How large is your content repository?

How large is a large file?

How many transformation steps exist in your flow from receipt through delivery of that large
file?

Thanks
Joe

On Mon, Apr 24, 2017 at 9:31 PM, Olav Jordens <Olav.Jordens@2degreesmobile.co.nz<mailto:Olav.Jordens@2degreesmobile.co.nz>>
wrote:
Apologies – forgot to mention that I am on nifi 1.1.2. on Linux RHEL 6.5

Thanks,
Olav



[cid:image001.jpg@01D2BDCD.C27A3560]

Olav Jordens
Senior ETL Developer
+64 226 202 429
+64 9 919 7000<tel:+64%209-919%207000>
2degreesmobile.co.nz<http://www.2degreesmobile.co.nz>


Two Degrees Mobile Limited | 47-49 George Street | Newmarket | Auckland | New Zealand
PO Box 8355 | Symonds Street | Auckland 1150 | New Zealand | Fax +64 9 919 7001<tel:+64%209-919%207001>
________________________________

Disclaimer
The e-mail and any files transmitted with it are confidential and may contain privileged or
copyright information. If you are not the intended recipient you must not copy, distribute,
or use this e-mail or the information contained in it for any purpose other than to notify
us of the error. If you have received this message in error, please notify the sender immediately,
by email or phone (+64 9 919 7000<tel:+64%209-919%207000>) and delete this email from
your system. Any views expressed in this message are those of the individual sender, except
where the sender specifically states them to be the views of Two Degrees Mobile Limited. We
do not guarantee that this material is free from viruses or any other defects although due
care has been taken to minimize the risk

From: Olav Jordens
Sent: Tuesday, 25 April 2017 1:27 p.m.
To: 'users@nifi.apache.org<mailto:users@nifi.apache.org>' <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Content repository filling up

Hi Users,

I have had this problem intermittently for some time now – the content repository disk fills
up even though there appear to be very few flow files in the system.
I have read the very good explanation of content claims here: https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html

My data flows includes a mix of very large and very small files, and so I suspect that the
small files within a claim are locking the large ones. I have followed the suggestion in the
above link:

If you are working with data that ranges greatly from very small to very large, you may want
to decrease the max appendable size and/or max flow file settings. By doing so you decrease
the number of FlowFiles that make it into a single claim. This in turns reduces the likelihood
of a single piece of data keeping large amounts of data still active in your content repository.

I have tried the most radical approach – one content claim per file which I believe should
imply that as soon as a large file leaves the flow, it is available for removal as I have
set archiving to false.
My issue is that even with these settings, the nifi content repository fills up, and when
I look inside the content repository, I see multiple flowfile contents contained within a
single claim file, which is unexpected as I have set nifi.content.claim.max.flow.files=1.


These are my content repository settings in nifi.properties:

# Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
# Exceptionally important to get this right when having a mix of large and small files
# We don't want a large file to be in the same claim as a small file which remains queued:
# The claim can never be released until the small file is no longer enqueued and has been
released
# Large files, first into a claim, will take up an entire claim anyway.
# So setting max.flow.files=1, there is no need to configure max.appendable.size
nifi.content.claim.max.appendable.size=10 MB
#nifi.content.claim.max.flow.files=100
nifi.content.claim.max.flow.files=1

#OPT
#nifi.content.repository.directory.default=./content_repository
nifi.content.repository.directory.default=/app/nifi/common/content_repository

# Archiving of content is disabled - no need to keep data hanging around once the flow is
complete.
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
#nifi.content.repository.archive.enabled=true
nifi.content.repository.archive.enabled=false
nifi.content.repository.always.sync=false
nifi.content.viewer.url=/nifi-content-viewer/

Am I looking at this incorrectly?

Thanks,
Olav




Mime
View raw message