lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: debugging growing index size
Date Fri, 13 Nov 2015 19:29:50 GMT
So with MMapDir at defaults (unmap is enabled) you see old files, with
no open file handles as reported by lsof, still existing in your index
directory, taking lots of space.

But with NIOFSDirectory the issue doesn't happen?  Are you sure?

I'll look at the 6.6 GB infoStream to see what it says about the ref counts.

Did you fix the issue in your app where you're not closing all opened
NRT readers?

Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 13, 2015 at 12:22 PM, Rob Audenaerde
<rob.audenaerde@gmail.com> wrote:
> I haven't disabled unmapping, and I am running out-of-the-box
> FSDirectory.open(). As I can see it tries to pick MMap.  For the test I
> explicitly constructed a NIOFSDIrectoryReader
>
> OS is (from the top of my head)  CentOS 6.x, Java 1.8.0u33. I can check
> later for more details.
> On Nov 13, 2015 18:07, "Uwe Schindler" <uwe@thetaphi.de> wrote:
>
>> Hi,
>>
>> Lucene has the workaround, so it should not happen, UNLESS you explicitly
>> disable the hack using MMapDirectory#setEnableUnmap(false).
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>> > -----Original Message-----
>> > From: will martin [mailto:wmartinusa@gmail.com]
>> > Sent: Friday, November 13, 2015 6:04 PM
>> > To: java-user@lucene.apache.org
>> > Subject: Re: debugging growing index size
>> >
>> > Hi Rob:
>> >
>> >
>> > Doesn’t this look like known SE issue JDK-4724038 and discussed by Peter
>> > Levart and Uwe Schindler on a lucene-dev thread 9/9/2015?
>> >
>> > MappedByteBuffer …. what OS are you on Rob? What JVM?
>> >
>> > http://bugs.java.com/view_bug.do?bug_id=4724038
>> >
>> > http://mail-archives.apache.org/mod_mbox/lucene-
>> > dev/201509.mbox/%3C55F0461A.2070005@gmail.com%3E
>> >
>> > hth
>> > -will
>> >
>> >
>> >
>> > > On Nov 13, 2015, at 11:23 AM, Rob Audenaerde
>> > <rob.audenaerde@gmail.com> wrote:
>> > >
>> > > I'm currently running using NIOFS. It seems to prevent the issue from
>> > > appearing.
>> > >
>> > > This is a second run (with applied deletes etc)
>> > >
>> > > raudenaerd@:/<6>index/index$sudo ls -lSra *.dvd
>> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
>> > >
>> > > raudenaerde:/<6>index/index$sudo ls -lSaa *.dvd
>> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
>> > >
>> > >
>> > >
>> > > On Thu, Nov 12, 2015 at 3:40 PM, Michael McCandless <
>> > > lucene@mikemccandless.com> wrote:
>> > >
>> > >> Hi Rob,
>> > >>
>> > >> A couple more things:
>> > >>
>> > >> Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
>> > >>
>> > >> Also, can you try your test using NIOFSDirectory instead?  Curious
if
>> > >> that changes things...
>> > >>
>> > >> Mike McCandless
>> > >>
>> > >> http://blog.mikemccandless.com
>> > >>
>> > >>
>> > >> On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
>> > >> <rob.audenaerde@gmail.com> wrote:
>> > >>> Curious indeed!
>> > >>>
>> > >>> I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate
>> > the
>> > >>> logs. Will get back with them in a day hopefully.
>> > >>>
>> > >>> Thanks for the extra logging!
>> > >>>
>> > >>> -Rob
>> > >>>
>> > >>> On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
>> > >>> lucene@mikemccandless.com> wrote:
>> > >>>
>> > >>>> Hmm, curious.
>> > >>>>
>> > >>>> I looked at the [large] infoStream output and I see segment
_3ou7
>> > >>>> present on init of IW, a few getReader calls referencing it,
then a
>> > >>>> forceMerge that indeed merges it away, yet I do NOT see IW
>> > attempting
>> > >>>> deletion of its files.
>> > >>>>
>> > >>>> And indeed I see plenty (too many: many times per second?)
of
>> > commits
>> > >>>> after that, so the index itself is no longer referencing _3ou7.
>> > >>>>
>> > >>>> If you are failing to close all NRT readers then I would expect
>> _3ou7
>> > >>>> to be in the lsof output, but it's not.
>> > >>>>
>> > >>>> The NRT readers close method has logic that notifies IndexWriter
>> when
>> > >>>> it's done "needing" the files, to emulate "delete on last close"
>> > >>>> semantics for filesystems like HDFS that don't do that ...
it's
>> > >>>> possible something is wrong here.
>> > >>>>
>> > >>>> Can you set the (public, static) boolean
>> > >>>> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-
>> > generate this
>> > >>>> log?  This causes IW to log the ref count of each file it's
tracking
>> > >>>> ...
>> > >>>>
>> > >>>> I'll also add a bit more verbosity to IW when NRT readers are
opened
>> > >>>> and close, for 5.4.0.
>> > >>>>
>> > >>>> Mike McCandless
>> > >>>>
>> > >>>> http://blog.mikemccandless.com
>> > >>>>
>> > >>>>
>> > >>>> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
>> > >>>> <rob.audenaerde@gmail.com> wrote:
>> > >>>>> Hi all,
>> > >>>>>
>> > >>>>> I'm still debugging the growing-index size. I think closing
index
>> > >> readers
>> > >>>>> might help (work in progress), but I can't really see them
holding
>> on
>> > >> to
>> > >>>>> files (at least, using lsof ). Restarting the application
sheds
>> some
>> > >>>> light,
>> > >>>>> I see logging on files that are no longer referenced.
>> > >>>>>
>> > >>>>> What I see is that there are files in the index-directory,
that
>> seem
>> > >> to
>> > >>>>> longer referenced..
>> > >>>>>
>> > >>>>> I put the output of the infoStream online, because is it
rather big
>> > >> (30MB
>> > >>>>> gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
>> > >>>>>
>> > >>>>> Output of lsof:  (executed 'sudo lsof *' in the index directory
 ).
>> > >> This
>> > >>>> is
>> > >>>>> on an CentOS box (maybe that influences stuff as well?)
>> > >>>>>
>> > >>>>> COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF    
NODE NAME
>> > >>>>> java    30581 apache  mem    REG  253,0 3176094924 18880508
>> > >>>>> _4gs5_Lucene50_0.dvd
>> > >>>>> java    30581 apache  mem    REG  253,0  505758610 18880546
>> _4gs5.fdt
>> > >>>>> java    30581 apache  mem    REG  253,0  369563337 18880631
>> > >>>>> _4gs5_Lucene50_0.tim
>> > >>>>> java    30581 apache  mem    REG  253,0  176344058 18880623
>> > >>>>> _4gs5_Lucene50_0.pos
>> > >>>>> java    30581 apache  mem    REG  253,0  378055201 18880606
>> > >>>>> _4gs5_Lucene50_0.doc
>> > >>>>> java    30581 apache  mem    REG  253,0  372579599 18880400
>> > >>>>> _4i5a_Lucene50_0.dvd
>> > >>>>> java    30581 apache  mem    REG  253,0   82017447 18880748
>> _4g37.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0   85376507 18880721
>> _4fb3.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0  363493917 18880533
>> > >>>>> _4ct1_Lucene50_0.dvd
>> > >>>>> java    30581 apache  mem    REG  253,0    9421892 18880806
>> _4gjc.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0   76877461 18880553
>> _4ct1.fdt
>> > >>>>> java    30581 apache  mem    REG  253,0   46271330 18880661
>> > >>>>> _4ct1_Lucene50_0.tim
>> > >>>>> java    30581 apache  mem    REG  253,0   26911387 18880653
>> > >>>>> _4ct1_Lucene50_0.pos
>> > >>>>> java    30581 apache  mem    REG  253,0   54678249 18880568
>> > >>>>> _4ct1_Lucene50_0.doc
>> > >>>>> java    30581 apache  mem    REG  253,0   76556587 18880328
>> _4i5a.fdt
>> > >>>>> java    30581 apache  mem    REG  253,0   45032159 18880389
>> > >>>>> _4i5a_Lucene50_0.tim
>> > >>>>> java    30581 apache  mem    REG  253,0   26486772 18880388
>> > >>>>> _4i5a_Lucene50_0.pos
>> > >>>>> java    30581 apache  mem    REG  253,0   55411002 18880362
>> > >>>>> _4i5a_Lucene50_0.doc
>> > >>>>> java    30581 apache  mem    REG  253,0   70484185 18880340
>> _4hkn.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0   10873921 18880324
>> _4gpz.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0   17230506 18880524
>> _4i11.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    6706969 18880575
>> _4i0t.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0   15135578 18880624
>> _4i0i.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0   15368310 18880717
>> _4hzp.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    5146140 18880583
>> _4hze.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    2917380 18880411
>> _4gs5.nvd
>> > >>>>> java    30581 apache  mem    REG  253,0    6871469 18880732
>> _4hod.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    2860341 18880495
>> _4i84.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     835726 18880660
>> _4i7z.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    1005595 18880648
>> _4i7w.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    5639672 18880401
>> _4i4o.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    4388371 18880440
>> _4i4a.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    1151845 18880512
>> _4i7v.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     941773 18880613
>> _4i7x.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     984023 18880588
>> _4i7o.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    1790005 18880619
>> _4i7y.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     466371 18880515
>> _4ct1.nvd
>> > >>>>> java    30581 apache  mem    REG  253,0     723280 18880573
>> _4i7q.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     806289 18880517
>> _4i7h.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0      17362 18880520
>> _4i9s.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     698362 18880531
>> _4i9r.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     483215 18880406
>> _4i5a.nvd
>> > >>>>> java    30581 apache  mem    REG  253,0      14110 18880416
>> _4i9v.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0       6121 18880412
>> _4i9t.cfs
>> > >>>>> java    30581 apache   30wW  REG  253,0          0 18877901
>> write.lock
>> > >>>>>
>> > >>>>> Output of some of the biggest files in the index directory:
>> > >>>>>
>> > >>>>> -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
>> > >>>>> -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54
>> > >> _4ct1_Lucene50_0.dvd
>> > >>>>> -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06
>> > >> _4gs5_Lucene50_0.tim
>> > >>>>> -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09
>> > >> _4i5a_Lucene50_0.dvd
>> > >>>>> -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06
>> > >> _4gs5_Lucene50_0.doc
>> > >>>>> -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
>> > >>>>> -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
>> > >>>>> -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55
>> > >> _3k3a_Lucene50_0.dvd
>> > >>>>> -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10
>> > >> _4gs5_Lucene50_0.dvd
>> > >>>>>
>> > >>>>> Note that the 3ou7 and 3k3a segments no longer appear to
be in use?
>> > >>>>
>> > >>>>
>> ---------------------------------------------------------------------
>> > >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >>>>
>> > >>>>
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >>
>> > >>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message