lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: debugging growing index size
Date Fri, 13 Nov 2015 17:05:29 GMT
Did you disable unmapping using MMapDirectory#setEnableUnmap() ? By default it should be enabled,
but maybe you disabled it for some reason?

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Rob Audenaerde [mailto:rob.audenaerde@gmail.com]
> Sent: Friday, November 13, 2015 5:24 PM
> To: java-user@lucene.apache.org
> Subject: Re: debugging growing index size
> 
> I'm currently running using NIOFS. It seems to prevent the issue from
> appearing.
> 
> This is a second run (with applied deletes etc)
> 
> raudenaerd@:/<6>index/index$sudo ls -lSra *.dvd
> -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
> 
> raudenaerde:/<6>index/index$sudo ls -lSaa *.dvd
> -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
> 
> 
> 
> On Thu, Nov 12, 2015 at 3:40 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
> 
> > Hi Rob,
> >
> > A couple more things:
> >
> > Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
> >
> > Also, can you try your test using NIOFSDirectory instead?  Curious if
> > that changes things...
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
> > <rob.audenaerde@gmail.com> wrote:
> > > Curious indeed!
> > >
> > > I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate
> the
> > > logs. Will get back with them in a day hopefully.
> > >
> > > Thanks for the extra logging!
> > >
> > > -Rob
> > >
> > > On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
> > > lucene@mikemccandless.com> wrote:
> > >
> > >> Hmm, curious.
> > >>
> > >> I looked at the [large] infoStream output and I see segment _3ou7
> > >> present on init of IW, a few getReader calls referencing it, then a
> > >> forceMerge that indeed merges it away, yet I do NOT see IW
> attempting
> > >> deletion of its files.
> > >>
> > >> And indeed I see plenty (too many: many times per second?) of
> commits
> > >> after that, so the index itself is no longer referencing _3ou7.
> > >>
> > >> If you are failing to close all NRT readers then I would expect _3ou7
> > >> to be in the lsof output, but it's not.
> > >>
> > >> The NRT readers close method has logic that notifies IndexWriter when
> > >> it's done "needing" the files, to emulate "delete on last close"
> > >> semantics for filesystems like HDFS that don't do that ... it's
> > >> possible something is wrong here.
> > >>
> > >> Can you set the (public, static) boolean
> > >> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-generate
> this
> > >> log?  This causes IW to log the ref count of each file it's tracking
> > >> ...
> > >>
> > >> I'll also add a bit more verbosity to IW when NRT readers are opened
> > >> and close, for 5.4.0.
> > >>
> > >> Mike McCandless
> > >>
> > >> http://blog.mikemccandless.com
> > >>
> > >>
> > >> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
> > >> <rob.audenaerde@gmail.com> wrote:
> > >> > Hi all,
> > >> >
> > >> > I'm still debugging the growing-index size. I think closing index
> > readers
> > >> > might help (work in progress), but I can't really see them holding
on
> > to
> > >> > files (at least, using lsof ). Restarting the application sheds some
> > >> light,
> > >> > I see logging on files that are no longer referenced.
> > >> >
> > >> > What I see is that there are files in the index-directory, that seem
> > to
> > >> > longer referenced..
> > >> >
> > >> > I put the output of the infoStream online, because is it rather big
> > (30MB
> > >> > gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
> > >> >
> > >> > Output of lsof:  (executed 'sudo lsof *' in the index directory  ).
> > This
> > >> is
> > >> > on an CentOS box (maybe that influences stuff as well?)
> > >> >
> > >> > COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
> > >> > java    30581 apache  mem    REG  253,0 3176094924 18880508
> > >> > _4gs5_Lucene50_0.dvd
> > >> > java    30581 apache  mem    REG  253,0  505758610 18880546 _4gs5.fdt
> > >> > java    30581 apache  mem    REG  253,0  369563337 18880631
> > >> > _4gs5_Lucene50_0.tim
> > >> > java    30581 apache  mem    REG  253,0  176344058 18880623
> > >> > _4gs5_Lucene50_0.pos
> > >> > java    30581 apache  mem    REG  253,0  378055201 18880606
> > >> > _4gs5_Lucene50_0.doc
> > >> > java    30581 apache  mem    REG  253,0  372579599 18880400
> > >> > _4i5a_Lucene50_0.dvd
> > >> > java    30581 apache  mem    REG  253,0   82017447 18880748 _4g37.cfs
> > >> > java    30581 apache  mem    REG  253,0   85376507 18880721 _4fb3.cfs
> > >> > java    30581 apache  mem    REG  253,0  363493917 18880533
> > >> > _4ct1_Lucene50_0.dvd
> > >> > java    30581 apache  mem    REG  253,0    9421892 18880806 _4gjc.cfs
> > >> > java    30581 apache  mem    REG  253,0   76877461 18880553 _4ct1.fdt
> > >> > java    30581 apache  mem    REG  253,0   46271330 18880661
> > >> > _4ct1_Lucene50_0.tim
> > >> > java    30581 apache  mem    REG  253,0   26911387 18880653
> > >> > _4ct1_Lucene50_0.pos
> > >> > java    30581 apache  mem    REG  253,0   54678249 18880568
> > >> > _4ct1_Lucene50_0.doc
> > >> > java    30581 apache  mem    REG  253,0   76556587 18880328 _4i5a.fdt
> > >> > java    30581 apache  mem    REG  253,0   45032159 18880389
> > >> > _4i5a_Lucene50_0.tim
> > >> > java    30581 apache  mem    REG  253,0   26486772 18880388
> > >> > _4i5a_Lucene50_0.pos
> > >> > java    30581 apache  mem    REG  253,0   55411002 18880362
> > >> > _4i5a_Lucene50_0.doc
> > >> > java    30581 apache  mem    REG  253,0   70484185 18880340 _4hkn.cfs
> > >> > java    30581 apache  mem    REG  253,0   10873921 18880324 _4gpz.cfs
> > >> > java    30581 apache  mem    REG  253,0   17230506 18880524 _4i11.cfs
> > >> > java    30581 apache  mem    REG  253,0    6706969 18880575 _4i0t.cfs
> > >> > java    30581 apache  mem    REG  253,0   15135578 18880624 _4i0i.cfs
> > >> > java    30581 apache  mem    REG  253,0   15368310 18880717 _4hzp.cfs
> > >> > java    30581 apache  mem    REG  253,0    5146140 18880583 _4hze.cfs
> > >> > java    30581 apache  mem    REG  253,0    2917380 18880411 _4gs5.nvd
> > >> > java    30581 apache  mem    REG  253,0    6871469 18880732 _4hod.cfs
> > >> > java    30581 apache  mem    REG  253,0    2860341 18880495 _4i84.cfs
> > >> > java    30581 apache  mem    REG  253,0     835726 18880660 _4i7z.cfs
> > >> > java    30581 apache  mem    REG  253,0    1005595 18880648 _4i7w.cfs
> > >> > java    30581 apache  mem    REG  253,0    5639672 18880401 _4i4o.cfs
> > >> > java    30581 apache  mem    REG  253,0    4388371 18880440 _4i4a.cfs
> > >> > java    30581 apache  mem    REG  253,0    1151845 18880512 _4i7v.cfs
> > >> > java    30581 apache  mem    REG  253,0     941773 18880613 _4i7x.cfs
> > >> > java    30581 apache  mem    REG  253,0     984023 18880588 _4i7o.cfs
> > >> > java    30581 apache  mem    REG  253,0    1790005 18880619 _4i7y.cfs
> > >> > java    30581 apache  mem    REG  253,0     466371 18880515 _4ct1.nvd
> > >> > java    30581 apache  mem    REG  253,0     723280 18880573 _4i7q.cfs
> > >> > java    30581 apache  mem    REG  253,0     806289 18880517 _4i7h.cfs
> > >> > java    30581 apache  mem    REG  253,0      17362 18880520 _4i9s.cfs
> > >> > java    30581 apache  mem    REG  253,0     698362 18880531 _4i9r.cfs
> > >> > java    30581 apache  mem    REG  253,0     483215 18880406 _4i5a.nvd
> > >> > java    30581 apache  mem    REG  253,0      14110 18880416 _4i9v.cfs
> > >> > java    30581 apache  mem    REG  253,0       6121 18880412 _4i9t.cfs
> > >> > java    30581 apache   30wW  REG  253,0          0 18877901 write.lock
> > >> >
> > >> > Output of some of the biggest files in the index directory:
> > >> >
> > >> > -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
> > >> > -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54
> > _4ct1_Lucene50_0.dvd
> > >> > -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06
> > _4gs5_Lucene50_0.tim
> > >> > -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09
> > _4i5a_Lucene50_0.dvd
> > >> > -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06
> > _4gs5_Lucene50_0.doc
> > >> > -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
> > >> > -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
> > >> > -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55
> > _3k3a_Lucene50_0.dvd
> > >> > -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10
> > _4gs5_Lucene50_0.dvd
> > >> >
> > >> > Note that the 3ou7 and 3k3a segments no longer appear to be in use?
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message