lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: debugging growing index size
Date Fri, 13 Nov 2015 20:37:32 GMT
I can see that segments that are deleted never decRef to 0, e.g.
_1dtf, I assume because NRT readers are not being closed.

I think you should fix that bug, so every NRT reader ever opened is
also closed (SearcherManager is a simple way to ensure this), and then
let's regroup if disk space is still being leaked with a single
IndexWriter.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 13, 2015 at 2:29 PM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> So with MMapDir at defaults (unmap is enabled) you see old files, with
> no open file handles as reported by lsof, still existing in your index
> directory, taking lots of space.
>
> But with NIOFSDirectory the issue doesn't happen?  Are you sure?
>
> I'll look at the 6.6 GB infoStream to see what it says about the ref counts.
>
> Did you fix the issue in your app where you're not closing all opened
> NRT readers?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 13, 2015 at 12:22 PM, Rob Audenaerde
> <rob.audenaerde@gmail.com> wrote:
>> I haven't disabled unmapping, and I am running out-of-the-box
>> FSDirectory.open(). As I can see it tries to pick MMap.  For the test I
>> explicitly constructed a NIOFSDIrectoryReader
>>
>> OS is (from the top of my head)  CentOS 6.x, Java 1.8.0u33. I can check
>> later for more details.
>> On Nov 13, 2015 18:07, "Uwe Schindler" <uwe@thetaphi.de> wrote:
>>
>>> Hi,
>>>
>>> Lucene has the workaround, so it should not happen, UNLESS you explicitly
>>> disable the hack using MMapDirectory#setEnableUnmap(false).
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>> > -----Original Message-----
>>> > From: will martin [mailto:wmartinusa@gmail.com]
>>> > Sent: Friday, November 13, 2015 6:04 PM
>>> > To: java-user@lucene.apache.org
>>> > Subject: Re: debugging growing index size
>>> >
>>> > Hi Rob:
>>> >
>>> >
>>> > Doesn’t this look like known SE issue JDK-4724038 and discussed by Peter
>>> > Levart and Uwe Schindler on a lucene-dev thread 9/9/2015?
>>> >
>>> > MappedByteBuffer …. what OS are you on Rob? What JVM?
>>> >
>>> > http://bugs.java.com/view_bug.do?bug_id=4724038
>>> >
>>> > http://mail-archives.apache.org/mod_mbox/lucene-
>>> > dev/201509.mbox/%3C55F0461A.2070005@gmail.com%3E
>>> >
>>> > hth
>>> > -will
>>> >
>>> >
>>> >
>>> > > On Nov 13, 2015, at 11:23 AM, Rob Audenaerde
>>> > <rob.audenaerde@gmail.com> wrote:
>>> > >
>>> > > I'm currently running using NIOFS. It seems to prevent the issue from
>>> > > appearing.
>>> > >
>>> > > This is a second run (with applied deletes etc)
>>> > >
>>> > > raudenaerd@:/<6>index/index$sudo ls -lSra *.dvd
>>> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
>>> > >
>>> > > raudenaerde:/<6>index/index$sudo ls -lSaa *.dvd
>>> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
>>> > >
>>> > >
>>> > >
>>> > > On Thu, Nov 12, 2015 at 3:40 PM, Michael McCandless <
>>> > > lucene@mikemccandless.com> wrote:
>>> > >
>>> > >> Hi Rob,
>>> > >>
>>> > >> A couple more things:
>>> > >>
>>> > >> Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
>>> > >>
>>> > >> Also, can you try your test using NIOFSDirectory instead?  Curious
if
>>> > >> that changes things...
>>> > >>
>>> > >> Mike McCandless
>>> > >>
>>> > >> http://blog.mikemccandless.com
>>> > >>
>>> > >>
>>> > >> On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
>>> > >> <rob.audenaerde@gmail.com> wrote:
>>> > >>> Curious indeed!
>>> > >>>
>>> > >>> I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and
recreate
>>> > the
>>> > >>> logs. Will get back with them in a day hopefully.
>>> > >>>
>>> > >>> Thanks for the extra logging!
>>> > >>>
>>> > >>> -Rob
>>> > >>>
>>> > >>> On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
>>> > >>> lucene@mikemccandless.com> wrote:
>>> > >>>
>>> > >>>> Hmm, curious.
>>> > >>>>
>>> > >>>> I looked at the [large] infoStream output and I see segment
_3ou7
>>> > >>>> present on init of IW, a few getReader calls referencing
it, then a
>>> > >>>> forceMerge that indeed merges it away, yet I do NOT see
IW
>>> > attempting
>>> > >>>> deletion of its files.
>>> > >>>>
>>> > >>>> And indeed I see plenty (too many: many times per second?)
of
>>> > commits
>>> > >>>> after that, so the index itself is no longer referencing
_3ou7.
>>> > >>>>
>>> > >>>> If you are failing to close all NRT readers then I would
expect
>>> _3ou7
>>> > >>>> to be in the lsof output, but it's not.
>>> > >>>>
>>> > >>>> The NRT readers close method has logic that notifies IndexWriter
>>> when
>>> > >>>> it's done "needing" the files, to emulate "delete on last
close"
>>> > >>>> semantics for filesystems like HDFS that don't do that
... it's
>>> > >>>> possible something is wrong here.
>>> > >>>>
>>> > >>>> Can you set the (public, static) boolean
>>> > >>>> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-
>>> > generate this
>>> > >>>> log?  This causes IW to log the ref count of each file
it's tracking
>>> > >>>> ...
>>> > >>>>
>>> > >>>> I'll also add a bit more verbosity to IW when NRT readers
are opened
>>> > >>>> and close, for 5.4.0.
>>> > >>>>
>>> > >>>> Mike McCandless
>>> > >>>>
>>> > >>>> http://blog.mikemccandless.com
>>> > >>>>
>>> > >>>>
>>> > >>>> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
>>> > >>>> <rob.audenaerde@gmail.com> wrote:
>>> > >>>>> Hi all,
>>> > >>>>>
>>> > >>>>> I'm still debugging the growing-index size. I think
closing index
>>> > >> readers
>>> > >>>>> might help (work in progress), but I can't really see
them holding
>>> on
>>> > >> to
>>> > >>>>> files (at least, using lsof ). Restarting the application
sheds
>>> some
>>> > >>>> light,
>>> > >>>>> I see logging on files that are no longer referenced.
>>> > >>>>>
>>> > >>>>> What I see is that there are files in the index-directory,
that
>>> seem
>>> > >> to
>>> > >>>>> longer referenced..
>>> > >>>>>
>>> > >>>>> I put the output of the infoStream online, because
is it rather big
>>> > >> (30MB
>>> > >>>>> gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
>>> > >>>>>
>>> > >>>>> Output of lsof:  (executed 'sudo lsof *' in the index
directory  ).
>>> > >> This
>>> > >>>> is
>>> > >>>>> on an CentOS box (maybe that influences stuff as well?)
>>> > >>>>>
>>> > >>>>> COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF
    NODE NAME
>>> > >>>>> java    30581 apache  mem    REG  253,0 3176094924
18880508
>>> > >>>>> _4gs5_Lucene50_0.dvd
>>> > >>>>> java    30581 apache  mem    REG  253,0  505758610
18880546
>>> _4gs5.fdt
>>> > >>>>> java    30581 apache  mem    REG  253,0  369563337
18880631
>>> > >>>>> _4gs5_Lucene50_0.tim
>>> > >>>>> java    30581 apache  mem    REG  253,0  176344058
18880623
>>> > >>>>> _4gs5_Lucene50_0.pos
>>> > >>>>> java    30581 apache  mem    REG  253,0  378055201
18880606
>>> > >>>>> _4gs5_Lucene50_0.doc
>>> > >>>>> java    30581 apache  mem    REG  253,0  372579599
18880400
>>> > >>>>> _4i5a_Lucene50_0.dvd
>>> > >>>>> java    30581 apache  mem    REG  253,0   82017447
18880748
>>> _4g37.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0   85376507
18880721
>>> _4fb3.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0  363493917
18880533
>>> > >>>>> _4ct1_Lucene50_0.dvd
>>> > >>>>> java    30581 apache  mem    REG  253,0    9421892
18880806
>>> _4gjc.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0   76877461
18880553
>>> _4ct1.fdt
>>> > >>>>> java    30581 apache  mem    REG  253,0   46271330
18880661
>>> > >>>>> _4ct1_Lucene50_0.tim
>>> > >>>>> java    30581 apache  mem    REG  253,0   26911387
18880653
>>> > >>>>> _4ct1_Lucene50_0.pos
>>> > >>>>> java    30581 apache  mem    REG  253,0   54678249
18880568
>>> > >>>>> _4ct1_Lucene50_0.doc
>>> > >>>>> java    30581 apache  mem    REG  253,0   76556587
18880328
>>> _4i5a.fdt
>>> > >>>>> java    30581 apache  mem    REG  253,0   45032159
18880389
>>> > >>>>> _4i5a_Lucene50_0.tim
>>> > >>>>> java    30581 apache  mem    REG  253,0   26486772
18880388
>>> > >>>>> _4i5a_Lucene50_0.pos
>>> > >>>>> java    30581 apache  mem    REG  253,0   55411002
18880362
>>> > >>>>> _4i5a_Lucene50_0.doc
>>> > >>>>> java    30581 apache  mem    REG  253,0   70484185
18880340
>>> _4hkn.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0   10873921
18880324
>>> _4gpz.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0   17230506
18880524
>>> _4i11.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    6706969
18880575
>>> _4i0t.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0   15135578
18880624
>>> _4i0i.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0   15368310
18880717
>>> _4hzp.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    5146140
18880583
>>> _4hze.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    2917380
18880411
>>> _4gs5.nvd
>>> > >>>>> java    30581 apache  mem    REG  253,0    6871469
18880732
>>> _4hod.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    2860341
18880495
>>> _4i84.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     835726
18880660
>>> _4i7z.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    1005595
18880648
>>> _4i7w.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    5639672
18880401
>>> _4i4o.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    4388371
18880440
>>> _4i4a.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    1151845
18880512
>>> _4i7v.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     941773
18880613
>>> _4i7x.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     984023
18880588
>>> _4i7o.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    1790005
18880619
>>> _4i7y.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     466371
18880515
>>> _4ct1.nvd
>>> > >>>>> java    30581 apache  mem    REG  253,0     723280
18880573
>>> _4i7q.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     806289
18880517
>>> _4i7h.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0      17362
18880520
>>> _4i9s.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     698362
18880531
>>> _4i9r.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     483215
18880406
>>> _4i5a.nvd
>>> > >>>>> java    30581 apache  mem    REG  253,0      14110
18880416
>>> _4i9v.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0       6121
18880412
>>> _4i9t.cfs
>>> > >>>>> java    30581 apache   30wW  REG  253,0          0
18877901
>>> write.lock
>>> > >>>>>
>>> > >>>>> Output of some of the biggest files in the index directory:
>>> > >>>>>
>>> > >>>>> -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04
_4fjn.cfs
>>> > >>>>> -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54
>>> > >> _4ct1_Lucene50_0.dvd
>>> > >>>>> -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06
>>> > >> _4gs5_Lucene50_0.tim
>>> > >>>>> -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09
>>> > >> _4i5a_Lucene50_0.dvd
>>> > >>>>> -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06
>>> > >> _4gs5_Lucene50_0.doc
>>> > >>>>> -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14
_3ou7.cfs
>>> > >>>>> -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04
_4gs5.fdt
>>> > >>>>> -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55
>>> > >> _3k3a_Lucene50_0.dvd
>>> > >>>>> -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10
>>> > >> _4gs5_Lucene50_0.dvd
>>> > >>>>>
>>> > >>>>> Note that the 3ou7 and 3k3a segments no longer appear
to be in use?
>>> > >>>>
>>> > >>>>
>>> ---------------------------------------------------------------------
>>> > >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> > >>>>
>>> > >>>>
>>> > >>
>>> > >> ---------------------------------------------------------------------
>>> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> > >>
>>> > >>
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message