nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <marka...@hotmail.com>
Subject Re: Content Repository Cleanup
Date Sun, 11 Dec 2016 17:17:36 GMT
Alan,


It's possible that you've run into some sort of bug that is preventing

it from cleaning up the Content  Repository properly. While it's stuck

in this state, could you capture a thread dump (bin/nifi.sh dump thread-dump.txt)?

That would help us determine if there is something going on that is

preventing the cleanup from happening.


Thanks

-Mark


________________________________
From: Alan Jackoway <alanj@cloudera.com>
Sent: Sunday, December 11, 2016 11:11 AM
To: dev@nifi.apache.org
Subject: Re: Content Repository Cleanup

This just filled up again even
with nifi.content.repository.archive.enabled=false.

On the node that is still alive, our queued flowfiles are 91 / 16.47 GB,
but the content repository directory is using 646 GB.

Is there a property I can set to make it clean things up more frequently? I
expected that once I turned archive enabled off, it would delete things
from the content repository as soon as the flow files weren't queued
anywhere. So far the only way I have found to reliably get nifi to clear
out the content repository is to restart it.

Our version string is the following, if that interests you:
11/26/2016 04:39:37 PST
Tagged nifi-1.1.0-RC2
>From ${buildRevision} on branch ${buildBranch}

Maybe we will go to the released 1.1 and see if that helps. Until then I'll
be restarting a lot and digging into the code to figure out where this
cleanup is supposed to happen. Any pointers on code/configs for that would
be appreciated.

Thanks,
Alan

On Sun, Dec 11, 2016 at 8:51 AM, Joe Gresock <jgresock@gmail.com> wrote:

> No, in my scenario a server restart would not affect the content repository
> size.
>
> On Sun, Dec 11, 2016 at 8:46 AM, Alan Jackoway <alanj@cloudera.com> wrote:
>
> > If we were in the situation Joe G described, should we expect that when
> we
> > kill and restart nifi it would clean everything up? That behavior has
> been
> > consistent every time - when the disk hits 100%, we kill nifi, delete
> > enough old content files to bring it back up, and before it bring the UI
> up
> > it deletes things to get within the archive policy again. That sounds
> less
> > like the files are stuck and more like it failed trying.
> >
> > For now I just turned off archiving, since we don't really need it for
> > this use case.
> >
> > I attached a jstack from last night's failure, which looks pretty boring
> > to me.
> >
> > On Sun, Dec 11, 2016 at 1:37 AM, Alan Jackoway <alanj@cloudera.com>
> wrote:
> >
> >> The scenario Joe G describes is almost exactly what we are doing. We
> >> bring in large files and unpack them into many smaller ones. In the most
> >> recent iteration of this problem, I saw that we had many small files
> queued
> >> up at the time trouble was happening. We will try your suggestion to
> see if
> >> the situation improves.
> >>
> >> Thanks,
> >> Alan
> >>
> >> On Sat, Dec 10, 2016 at 6:57 AM, Joe Gresock <jgresock@gmail.com>
> wrote:
> >>
> >>> Not sure if your scenario is related, but one of the NiFi devs recently
> >>> explained to me that the files in the content repository are actually
> >>> appended together with other flow file content (please correct me if
> I'm
> >>> explaining it wrong).  That means if you have many small flow files in
> >>> your
> >>> current backlog, and several large flow files have recently left the
> >>> flow,
> >>> the large ones could still be hanging around in the content repository
> as
> >>> long as the small ones are still there, if they're in the same appended
> >>> files on disk.
> >>>
> >>> This scenario recently happened to us: we had a flow with ~20 million
> >>> tiny
> >>> flow files queued up, and at the same time we were also processing a
> >>> bunch
> >>> of 1GB files, which left the flow quickly.  The content repository was
> >>> much
> >>> larger than what was actually being reported in the flow stats, and our
> >>> disks were almost full.  On a hunch, I tried the following strategy:
> >>> - MergeContent the tiny flow files using flow-file-v3 format (to
> capture
> >>> all attributes)
> >>> - MergeContent 10,000 of the packaged flow files using tar format for
> >>> easier storage on disk
> >>> - PutFile into a directory
> >>> - GetFile from the same directory, but using back pressure from here on
> >>> out
> >>> (so that the flow simply wouldn't pull the same files from disk until
> it
> >>> was really ready for them)
> >>> - UnpackContent (untar them)
> >>> - UnpackContent (turn them back into flow files with the original
> >>> attributes)
> >>> - Then do the processing they were originally designed for
> >>>
> >>> This had the effect of very quickly reducing the size of my content
> >>> repository to very nearly the actual size I saw reported in the flow,
> and
> >>> my disk usage dropped from ~95% to 50%, which is the configured content
> >>> repository max usage percentage.  I haven't had any problems since.
> >>>
> >>> Hope this helps.
> >>> Joe
> >>>
> >>> On Sat, Dec 10, 2016 at 12:04 AM, Joe Witt <joe.witt@gmail.com> wrote:
> >>>
> >>> > Alan,
> >>> >
> >>> > That retention percentage only has to do with the archive of data
> >>> > which kicks in once a given chunk of content is no longer reachable
> by
> >>> > active flowfiles in the flow.  For it to grow to 100% typically would
> >>> > mean that you have data backlogged in the flow that account for that
> >>> > much space.  If that is certainly not the case for you then we need
> to
> >>> > dig deeper.  If you could do screenshots or share log files and stack
> >>> > dumps around this time those would all be helpful.  If the
> screenshots
> >>> > and such are too sensitive please just share as much as you can.
> >>> >
> >>> > Thanks
> >>> > Joe
> >>> >
> >>> > On Fri, Dec 9, 2016 at 9:55 PM, Alan Jackoway <alanj@cloudera.com>
> >>> wrote:
> >>> > > One other note on this, when it came back up there were tons of
> >>> messages
> >>> > > like this:
> >>> > >
> >>> > > 2016-12-09 18:36:36,244 INFO [main] o.a.n.c.repository.
> >>> > FileSystemRepository
> >>> > > Found unknown file /path/to/content_repository/49
> >>> 8/1481329796415-87538
> >>> > > (1071114 bytes) in File System Repository; archiving file
> >>> > >
> >>> > > I haven't dug into what that means.
> >>> > > Alan
> >>> > >
> >>> > > On Fri, Dec 9, 2016 at 9:53 PM, Alan Jackoway <alanj@cloudera.com>
> >>> > wrote:
> >>> > >
> >>> > >> Hello,
> >>> > >>
> >>> > >> We have a node on which nifi content repository keeps growing
to
> use
> >>> > 100%
> >>> > >> of the disk. It's a relatively high-volume process. It chewed
> >>> through
> >>> > more
> >>> > >> than 100GB in the three hours between when we first saw it
hit
> 100%
> >>> of
> >>> > the
> >>> > >> disk and when we just cleaned it up again.
> >>> > >>
> >>> > >> We are running nifi 1.1 for this. Our nifi.properties looked
like
> >>> this:
> >>> > >>
> >>> > >> nifi.content.repository.implementation=org.apache.
> >>> > >> nifi.controller.repository.FileSystemRepository
> >>> > >> nifi.content.claim.max.appendable.size=10 MB
> >>> > >> nifi.content.claim.max.flow.files=100
> >>> > >> nifi.content.repository.directory.default=./content_repository
> >>> > >> nifi.content.repository.archive.max.retention.period=12 hours
> >>> > >> nifi.content.repository.archive.max.usage.percentage=50%
> >>> > >> nifi.content.repository.archive.enabled=true
> >>> > >> nifi.content.repository.always.sync=false
> >>> > >>
> >>> > >> I just bumped retention period down to 2 hours, but should
max
> usage
> >>> > >> percentage protect us from using 100% of the disk?
> >>> > >>
> >>> > >> Unfortunately we didn't get jstacks on either failure. If
it hits
> >>> 100%
> >>> > >> again I will make sure to get that.
> >>> > >>
> >>> > >> Thanks,
> >>> > >> Alan
> >>> > >>
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> I know what it is to be in need, and I know what it is to have
> plenty.  I
> >>> have learned the secret of being content in any and every situation,
> >>> whether well fed or hungry, whether living in plenty or in want.  I can
> >>> do
> >>> all this through him who gives me strength.    *-Philippians 4:12-13*
> >>>
> >>
> >>
> >
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    *-Philippians 4:12-13*
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message