lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: optimize with num segments > 1 index keeps growing
Date Thu, 21 Jul 2011 20:45:53 GMT
There is also expungeDeletes()...

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: v.sevel@lombardodier.com [mailto:v.sevel@lombardodier.com]
> Sent: Thursday, July 21, 2011 8:39 PM
> To: java-user@lucene.apache.org
> Subject: Re: optimize with num segments > 1 index keeps growing
> 
> Hi, thanks for this explanation.
> so what is the best solution: merge the large segment (how can I do that)
or
> work with many segments (10?) so that I will avoid have this "large
segment"
> issue?
> thanks,
> vince
> 
> 
> Vincent Sevel
> Lombard Odier Darier Hentsch & Cie
> 11, rue de la Corraterie - 1204 Genève - Suisse T +41 22 709 3376 - F +41
22 709
> 3782 www.lombardodier.com
> 
> 
> 
> 
> 
> 
> 
> Simon Willnauer <simon.willnauer@googlemail.com>
> 
> 
> 21.07.2011 20:06
> Please respond to
> java-user@lucene.apache.org
> 
> 
> 
> To
> java-user@lucene.apache.org
> cc
> 
> Subject
> Re: optimize with num segments > 1 index keeps growing
> 
> 
> 
> 
> 
> 
> so the problem here is that you have one really big segment _52aho.* and
> several smaller ones _7e0wz.*, _7e0xu.*, _7e1x5.* ....
> if you optimize to 2 segmetns all the smaller segments are merged into one
> but all the large segment remains untouched. This means that all deleted
> documents in the large segment are not removed / freed while if you
> optimized to one segment they are removed. In the single seg.
> index there is no *.del file present meaning no deletes. Unless you merge
> the large segment all you deleted documents are only marked as delete but
> not yet removed.
> 
> simon
> 
> On Thu, Jul 21, 2011 at 5:50 PM,  <v.sevel@lombardodier.com> wrote:
> > hi,
> > closing after the 2 segments optimize does not change it.
> > also I am running with lucene 3.1.0.
> > cheers,
> > vince
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Ian Lea <ian.lea@gmail.com>
> >
> >
> > 21.07.2011 17:30
> > Please respond to
> > java-user@lucene.apache.org
> >
> >
> >
> > To
> > java-user@lucene.apache.org
> > cc
> >
> > Subject
> > Re: optimize with num segments > 1 index keeps growing
> >
> >
> >
> >
> >
> >
> > A write.lock file with timestamp of 13:58 is in all the listings. The
> > first thing I'd try is to add some IndexWriter.close() calls.
> >
> >
> > --
> > Ian.
> >
> >
> >
> > On Thu, Jul 21, 2011 at 4:05 PM,  <v.sevel@lombardodier.com> wrote:
> >> Hi,
> >>
> >> here is a concrete example.
> >>
> >> I am starting with an index that has 19017236 docs, which takes 58989
> Mb
> >> on disk:
> >>
> >> 21.07.2011 15:21                20 segments.gen
> >> 21.07.2011 15:21             2'974 segments_2acy4
> >> 21.07.2011 13:58                 0 write.lock
> >> 16.07.2011  02:21    33'445'798'886 _52aho.fdt
> >> 16.07.2011  02:21       178'723'932 _52aho.fdx
> >> 16.07.2011  01:58             5'002 _52aho.fnm
> >> 16.07.2011  03:10     9'857'410'889 _52aho.frq
> >> 16.07.2011  03:10     4'538'234'846 _52aho.prx
> >> 16.07.2011  03:10        61'581'767 _52aho.tii
> >> 16.07.2011  03:10     5'505'039'790 _52aho.tis
> >> 21.07.2011 01:01         1'899'536 _52aho_5.del
> >> 21.07.2011 01:05     4'222'206'034 _6t61z.fdt
> >> 21.07.2011 01:05        21'424'556 _6t61z.fdx
> >> 21.07.2011 01:01             5'002 _6t61z.fnm
> >> 21.07.2011 01:12     1'170'370'187 _6t61z.frq
> >> 21.07.2011  01:12       598'373'388 _6t61z.prx
> >> 21.07.2011  01:12         7'574'912 _6t61z.tii
> >> 21.07.2011  01:12       678'766'206 _6t61z.tis
> >> 21.07.2011  13:46     1'458'592'058 _7d6me.cfs
> >> 21.07.2011  13:48        15'702'654 _7dhgz.cfs
> >> 21.07.2011  13:52        16'800'942 _7dphm.cfs
> >> 21.07.2011  13:55        16'714'431 _7dxht.cfs
> >> 21.07.2011  14:24        17'505'435 _7e0wz.cfs
> >> 21.07.2011  14:24         5'875'852 _7e0xu.cfs
> >> 21.07.2011  14:48        18'340'470 _7e1x5.cfs
> >> 21.07.2011  15:19        16'978'564 _7e3ck.cfs
> >> 21.07.2011  15:21         1'208'656 _7e3hv.cfs
> >> 21.07.2011  15:21            19'361 _7e3hw.cfs
> >>              28 File(s) 61'855'156'350 bytes
> >>
> >> I am doing a delete of some of the older documents. after the delete,
> >> I commit then I optimize down to 2 segments. at the end of the
> >> optimize
> > the
> >> index contains 18702510 docs (314727 were deleted) and it takes now
> > 58975
> >> Mb on disk:
> >>
> >> 21.07.2011  15:37                20 segments.gen
> >> 21.07.2011  15:37               524 segments_2acy6
> >> 21.07.2011  13:58                 0 write.lock
> >> 16.07.2011  02:21    33'445'798'886 _52aho.fdt
> >> 16.07.2011  02:21       178'723'932 _52aho.fdx
> >> 16.07.2011  01:58             5'002 _52aho.fnm
> >> 16.07.2011  03:10     9'857'410'889 _52aho.frq
> >> 16.07.2011  03:10     4'538'234'846 _52aho.prx
> >> 16.07.2011  03:10        61'581'767 _52aho.tii
> >> 16.07.2011  03:10     5'505'039'790 _52aho.tis
> >> 21.07.2011  15:23         1'999'945 _52aho_6.del
> >> 21.07.2011  15:31     5'194'848'138 _7e3hy.fdt
> >> 21.07.2011  15:31        28'613'668 _7e3hy.fdx
> >> 21.07.2011  15:25             5'002 _7e3hy.fnm
> >> 21.07.2011  15:37     1'529'771'296 _7e3hy.frq
> >> 21.07.2011  15:37       726'582'244 _7e3hy.prx
> >> 21.07.2011  15:37         8'518'198 _7e3hy.tii
> >> 21.07.2011  15:37       763'213'144 _7e3hy.tis
> >>              18 File(s) 61'840'347'291 bytes
> >>
> >> as you can see, size on disk did not really change. at this point I
> >> optimize down to 1 segment and at the end the index takes 48273 Mb on
> >> disk:
> >>
> >> 21.07.2011  16:46                20 segments.gen
> >> 21.07.2011  16:46               278 segments_2acy8
> >> 21.07.2011  13:58                 0 write.lock
> >> 21.07.2011  16:06    32'901'423'750 _7e3hz.fdt
> >> 21.07.2011  16:06       149'582'052 _7e3hz.fdx
> >> 21.07.2011  15:42             5'002 _7e3hz.fnm
> >> 21.07.2011  16:46     8'608'541'177 _7e3hz.frq
> >> 21.07.2011  16:46     4'392'616'115 _7e3hz.prx
> >> 21.07.2011  16:46        50'571'856 _7e3hz.tii
> >> 21.07.2011  16:46     4'515'914'658 _7e3hz.tis
> >>              10 File(s) 50'618'654'908 bytes
> >>
> >>
> >> this means that with the 1 segment optimize I was able to reclaim 10
> >> Gb
> > on
> >> disk that the 2 segments optimize could not achieve.
> >>
> >> how can this be explained? is that a normal behavior?
> >>
> >> thanks,
> >>
> >> vince
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Simon Willnauer <simon.willnauer@googlemail.com>
> >>
> >>
> >> 20.07.2011 23:11
> >> Please respond to
> >> java-user@lucene.apache.org
> >>
> >>
> >>
> >> To
> >> java-user@lucene.apache.org
> >> cc
> >>
> >> Subject
> >> Re: optimize with num segments > 1 index keeps growing
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Jul 20, 2011 at 2:00 PM,  <v.sevel@lombardodier.com> wrote:
> >>> Hi,
> >>>
> >>> I index several millions small documents per day. each day, I remove
> >> some
> >>> of the older documents to keep the index at a stable number of
> >> documents.
> >>> after each purge, I commit then I optimize the index. what I found
> >>> is
> >> that
> >>> if I keep optimizing with max num segments = 2, then the index keeps
> >>> growing on the disk. but as soon as I optimize with just 1 segment,
> the
> >>> space gets reclaimed on the disk. so, I have currently adopted the
> >>> following strategy : every night I optimize with 2 segments, except
> > once
> >>> per week where I optimize with just 1 segment.
> >>
> >> what do you mean by keeps growing. you have n segments and you
> >> optimize down to 2 and the index is bigger than the one with n
> >> segments?
> >>
> >> simon
> >>>
> >>> is that an expected behavior?
> >>> I guess I am doing something special because I was not able to
> > reproduce
> >>> this behavior in a unit test. what could it be?
> >>>
> >>> it would be nice to get some explanatory services within the product
> to
> >>> help get some understanding on its behavior. something that tells
> >>> you
> >> some
> >>> information about your index for instance (number of docs in the
> >> different
> >>> states, how the space is being used, ...). lucene is a wonderful
> >> product,
> >>> but to me this is almost like black magic, and when there is a
> specific
> >>> behavior, I have got little clues to figure out something by myself.
> >> some
> >>> user oriented logging would be nice as well (the index writer info
> >> stream
> >>> is really verbose and very low level).
> >>>
> >>> thanks for your help,
> >>>
> >>>
> >>> Vince
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
> > ************************ DISCLAIMER
> ************************ This
> > message is intended only for use by the person to whom it is
> > addressed. It may contain information that is privileged and
> > confidential. Its content does not constitute a formal commitment by
> > Lombard Odier Darier Hentsch & Cie or any of its branches or
> > affiliates.
> > If you are not the intended recipient of this message, kindly notify
> > the sender immediately and destroy this message. Thank You.
> >
> **********************************************************
> *******
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> ************************ DISCLAIMER ************************
> This message is intended only for use by the person to whom it is
addressed.
> It may contain information that is privileged and confidential. Its
content
> does not constitute a formal commitment by Lombard Odier Darier Hentsch
> & Cie or any of its branches or affiliates.
> If you are not the intended recipient of this message, kindly notify the
> sender immediately and destroy this message. Thank You.
> **********************************************************
> *******


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message