lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Schindler <...@thetaphi.de>
Subject Re: optimize with num segments > 1 index keeps growing
Date Fri, 09 Sep 2011 19:07:45 GMT
Hi,

This is still some kind of bug, because expungeDeletes is documented to remove all deletes.
Maybe we need to modify MergePolicy?

Uwe
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de



Michael McCandless <lucene@mikemccandless.com> schrieb:

TieredMergePolicy by default will only merge a segment if it has > 10%
deletions.

Can you try calling .setExpungeDeletesPctAllowed(0.0) and then expunge again?

Mike McCandless

http://blog.mikemccandless.com

On Fri, Sep 9, 2011 at 1:41 PM, <v.sevel@lombardodier.com> wrote:
> Hi,
>
> this post is quite old, but I would like to share some recen developments.
>
> I applied the recommandation. my process became: expunge deletes and
> optimize 2 segments.
>
> at the time I was with lucene 3.1 and that solved my issue. recently I
> moved to lucene 3.3, and I tried playing with the new tiered merge policy.
> what I found was that after an expunge, the number of deleted docs would
> stay the same, and space would not be reclaimed on the disk. I switched
> back to the default merge policy (LogByteSizeMergePolicy:
> minMergeSize=1677721, mergeFactor=10, maxMergeSize=2147483648,
> maxMergeSizeForOptimize=9223372036854775807, calibrateSizeByDeletes=true,
> maxMergeDocs=2147483647, useCompoundFile=true, noCFSRatio=0.1) and got
> this time the right behavior : size was reclaimed on disk. I even tried
> with the BalancedSegmentMergePolicy and got again the right behavior.
>
> so this issue seems to affect only the tiered merge policy.
>
> to illustrate this, I took an index with many deleted docs then
> expunged/optimized while using the tiered policy, then did the same thing
> with a default merge policy. here is for each step the content of the
> directory:
>
> before:
>
> 09.09.2011  17:38                20 segments.gen
> 09.09.2011  17:38             5'335 segments_4bf1u
> 06.09.2011  15:27                 0 write.lock
> 06.09.2011  00:49    31'681'157'794 _jhwld.fdt
> 06.09.2011  00:49       115'562'268 _jhwld.fdx
> 06.09.2011  00:37             5'347 _jhwld.fnm
> 06.09.2011  01:13     7'147'947'472 _jhwld.frq
> 06.09.2011  01:13     3'927'649'164 _jhwld.prx
> 06.09.2011  01:13        41'992'760 _jhwld.tii
> 06.09.2011  01:13     3'745'729'056 _jhwld.tis
> 09.09.2011  00:27         1'805'669 _jhwld_3.del
> 09.09.2011  00:31    11'397'619'448 _jtrwg.fdt
> 09.09.2011  00:31        98'393'316 _jtrwg.fdx
> 09.09.2011  00:27             5'347 _jtrwg.fnm
> 09.09.2011  00:47     5'146'273'732 _jtrwg.frq
> 09.09.2011  00:47     1'661'436'146 _jtrwg.prx
> 09.09.2011  00:47        23'950'194 _jtrwg.tii
> 09.09.2011  00:47     2'139'903'139 _jtrwg.tis
> 09.09.2011  07:39        94'471'867 _jugaa.cfs
> 09.09.2011  10:14       252'716'611 _juok2.cfs
> 09.09.2011  15:45         7'986'102 _jwuaq.cfs
> 09.09.2011  16:00         5'780'703 _jx45g.cfs
> 09.09.2011  16:00       333'981'384 _jx46a.cfs
> 09.09.2011  16:23        20'955'761 _jxge0.cfs
> 09.09.2011  16:46        19'258'025 _jxmas.cfs
> 09.09.2011  16:55        16'622'800 _jxpv4.cfs
> 09.09.2011  17:10        14'605'028 _jxvd6.cfs
> 09.09.2011  17:34        12'456'476 _jy28o.cfs
> 09.09.2011  17:38         2'584'950 _jy91y.cfs
> 09.09.2011  17:38         2'595'049 _jy92i.cfs
> 09.09.2011  17:38         2'600'991 _jy932.cfs
> 09.09.2011  17:38         2'610'278 _jy93m.cfs
> 09.09.2011  17:38            46'664 _jy93x.cfs
> 09.09.2011  17:38             9'765 _jy93y.cfs
> 09.09.2011  17:38            10'691 _jy93z.cfs
> 09.09.2011  17:38             9'533 _jy940.cfs
> 09.09.2011  17:38            11'684 _jy941.cfs
> 09.09.2011  17:38             8'996 _jy942.cfs
>              38 File(s) 67'918'759'565 bytes
>
>
> after expunge/optimize (tiered merge policy):
>
> 09.09.2011  18:02                20 segments.gen
> 09.09.2011  18:02             3'171 segments_4bf3g
> 06.09.2011  15:27                 0 write.lock
> 06.09.2011  00:49    31'681'157'794 _jhwld.fdt
> 06.09.2011  00:49       115'562'268 _jhwld.fdx
> 06.09.2011  00:37             5'347 _jhwld.fnm
> 06.09.2011  01:13     7'147'947'472 _jhwld.frq
> 06.09.2011  01:13     3'927'649'164 _jhwld.prx
> 06.09.2011  01:13        41'992'760 _jhwld.tii
> 06.09.2011  01:13     3'745'729'056 _jhwld.tis
> 09.09.2011  17:39         1'805'669 _jhwld_4.del
> 09.09.2011  17:45    11'814'367'373 _jy9iy.fdt
> 09.09.2011  17:45       101'565'036 _jy9iy.fdx
> 09.09.2011  17:39             5'347 _jy9iy.fnm
> 09.09.2011  18:01     5'328'530'169 _jy9iy.frq
> 09.09.2011  18:01     1'733'490'572 _jy9iy.prx
> 09.09.2011  18:01        25'072'713 _jy9iy.tii
> 09.09.2011  18:01     2'239'702'399 _jy9iy.tis
> 09.09.2011  18:02           185'962 _jy9mv.cfs
> 09.09.2011  18:02             9'955 _jy9mw.cfs
> 09.09.2011  18:02            10'380 _jy9mx.cfs
> 09.09.2011  18:02             9'341 _jy9my.cfs
> 09.09.2011  18:02             9'228 _jy9mz.cfs
> 09.09.2011  18:02            10'382 _jy9n0.cfs
> 09.09.2011  18:02             9'345 _jy9n1.cfs
> 09.09.2011  18:02             9'231 _jy9n2.cfs
> 09.09.2011  18:02             8'961 _jy9n3.cfs
> 09.09.2011  18:02            10'381 _jy9n4.cfs
> 09.09.2011  18:02           199'651 _jy9n5.cfs
> 09.09.2011  18:02             9'345 _jy9n6.cfs
> 09.09.2011  18:02             9'230 _jy9n7.cfs
>              31 File(s) 67'905'077'722 bytes
>
> after expungeDeletes/optimize with default merge policy :
>
> 09.09.2011  19:31                20 segments.gen
> 09.09.2011  19:31             2'081 segments_4bfpe
> 09.09.2011  18:13                 0 write.lock
> 09.09.2011  18:42    30'133'772'814 _jyb4c.fdt
> 09.09.2011  18:42       103'164'812 _jyb4c.fdx
> 09.09.2011  18:27             5'347 _jyb4c.fnm
> 09.09.2011  19:03     6'474'023'590 _jyb4c.frq
> 09.09.2011  19:03     3'699'406'141 _jyb4c.prx
> 09.09.2011  19:03        37'900'657 _jyb4c.tii
> 09.09.2011  19:03     3'380'266'875 _jyb4c.tis
> 09.09.2011  19:15    11'820'477'088 _jyb4e.fdt
> 09.09.2011  19:15       101'659'700 _jyb4e.fdx
> 09.09.2011  19:03             5'347 _jyb4e.fnm
> 09.09.2011  19:29     5'333'219'797 _jyb4e.frq
> 09.09.2011  19:29     1'734'633'179 _jyb4e.prx
> 09.09.2011  19:29        25'105'023 _jyb4e.tii
> 09.09.2011  19:29     2'242'558'333 _jyb4e.tis
> 09.09.2011  19:31           223'600 _jyb5t.cfs
> 09.09.2011  19:31             9'545 _jyb5u.cfs
> 09.09.2011  19:31             8'963 _jyb5v.cfs
> 09.09.2011  19:31             9'250 _jyb5w.cfs
> 09.09.2011  19:31             9'047 _jyb5x.cfs
> 09.09.2011  19:31            11'253 _jyb5y.cfs
> 09.09.2011  19:31            11'239 _jyb5z.cfs
>              24 File(s) 65'086'483'701 bytes
>
> any clue to what is happenning?
>
> thanks,
>
>
> Vincent
>
>
>
>
>
>
>
>
> "Uwe Schindler" <uwe@thetaphi.de>
>
>
> 21.07.2011 22:46
> Please respond to
> java-user@lucene.apache.org
>
>
>
> To
> <java-user@lucene.apache.org>
> cc
>
> Subject
> RE: optimize with num segments > 1 index keeps growing
>
>
>
>
>
>
> There is also expungeDeletes()...
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: v.sevel@lombardodier.com [mailto:v.sevel@lombardodier.com]
>> Sent: Thursday, July 21, 2011 8:39 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: optimize with num segments > 1 index keeps growing
>>
>> Hi, thanks for this explanation.
>> so what is the best solution: merge the large segment (how can I do
> that)
> or
>> work with many segments (10?) so that I will avoid have this "large
> segment"
>> issue?
>> thanks,
>> vince
>>
>>
>> Vincent Sevel
>> Lombard Odier Darier Hentsch & Cie
>> 11, rue de la Corraterie - 1204 Genève - Suisse T +41 22 709 3376 - F
> +41
> 22 709
>> 3782 www.lombardodier.com
>>
>>
>>
>>
>>
>>
>>
>> Simon Willnauer <simon.willnauer@googlemail.com>
>>
>>
>> 21.07.2011 20:06
>> Please respond to
>> java-user@lucene.apache.org
>>
>>
>>
>> To
>> java-user@lucene.apache.org
>> cc
>>
>> Subject
>> Re: optimize with num segments > 1 index keeps growing
>>
>>
>>
>>
>>
>>
>> so the problem here is that you have one really big segment _52aho.* and
>> several smaller ones _7e0wz.*, _7e0xu.*, _7e1x5.* ....
>> if you optimize to 2 segmetns all the smaller segments are merged into
> one
>> but all the large segment remains untouched. This means that all deleted
>> documents in the large segment are not removed / freed while if you
>> optimized to one segment they are removed. In the single seg.
>> index there is no *.del file present meaning no deletes. Unless you
> merge
>> the large segment all you deleted documents are only marked as delete
> but
>> not yet removed.
>>
>> simon
>>
>> On Thu, Jul 21, 2011 at 5:50 PM,  <v.sevel@lombardodier.com> wrote:
>> > hi,
>> > closing after the 2 segments optimize does not change it.
>> > also I am running with lucene 3.1.0.
>> > cheers,
>> > vince
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Ian Lea <ian.lea@gmail.com>
>> >
>> >
>> > 21.07.2011 17:30
>> > Please respond to
>> > java-user@lucene.apache.org
>> >
>> >
>> >
>> > To
>> > java-user@lucene.apache.org
>> > cc
>> >
>> > Subject
>> > Re: optimize with num segments > 1 index keeps growing
>> >
>> >
>> >
>> >
>> >
>> >
>> > A write.lock file with timestamp of 13:58 is in all the listings. The
>> > first thing I'd try is to add some IndexWriter.close() calls.
>> >
>> >
>> > --
>> > Ian.
>> >
>> >
>> >
>> > On Thu, Jul 21, 2011 at 4:05 PM,  <v.sevel@lombardodier.com> wrote:
>> >> Hi,
>> >>
>> >> here is a concrete example.
>> >>
>> >> I am starting with an index that has 19017236 docs, which takes 58989
>> Mb
>> >> on disk:
>> >>
>> >> 21.07.2011 15:21                20 segments.gen
>> >> 21.07.2011 15:21             2'974 segments_2acy4
>> >> 21.07.2011 13:58                 0 write.lock
>> >> 16.07.2011  02:21    33'445'798'886 _52aho.fdt
>> >> 16.07.2011  02:21       178'723'932 _52aho.fdx
>> >> 16.07.2011  01:58             5'002 _52aho.fnm
>> >> 16.07.2011  03:10     9'857'410'889 _52aho.frq
>> >> 16.07.2011  03:10     4'538'234'846 _52aho.prx
>> >> 16.07.2011  03:10        61'581'767 _52aho.tii
>> >> 16.07.2011  03:10     5'505'039'790 _52aho.tis
>> >> 21.07.2011 01:01         1'899'536 _52aho_5.del
>> >> 21.07.2011 01:05     4'222'206'034 _6t61z.fdt
>> >> 21.07.2011 01:05        21'424'556 _6t61z.fdx
>> >> 21.07.2011 01:01             5'002 _6t61z.fnm
>> >> 21.07.2011 01:12     1'170'370'187 _6t61z.frq
>> >> 21.07.2011  01:12       598'373'388 _6t61z.prx
>> >> 21.07.2011  01:12         7'574'912 _6t61z.tii
>> >> 21.07.2011  01:12       678'766'206 _6t61z.tis
>> >> 21.07.2011  13:46     1'458'592'058 _7d6me.cfs
>> >> 21.07.2011  13:48        15'702'654 _7dhgz.cfs
>> >> 21.07.2011  13:52        16'800'942 _7dphm.cfs
>> >> 21.07.2011  13:55        16'714'431 _7dxht.cfs
>> >> 21.07.2011  14:24        17'505'435 _7e0wz.cfs
>> >> 21.07.2011  14:24         5'875'852 _7e0xu.cfs
>> >> 21.07.2011  14:48        18'340'470 _7e1x5.cfs
>> >> 21.07.2011  15:19        16'978'564 _7e3ck.cfs
>> >> 21.07.2011  15:21         1'208'656 _7e3hv.cfs
>> >> 21.07.2011  15:21            19'361 _7e3hw.cfs
>> >>              28 File(s) 61'855'156'350 bytes
>> >>
>> >> I am doing a delete of some of the older documents. after the delete,
>> >> I commit then I optimize down to 2 segments. at the end of the
>> >> optimize
>> > the
>> >> index contains 18702510 docs (314727 were deleted) and it takes now
>> > 58975
>> >> Mb on disk:
>> >>
>> >> 21.07.2011  15:37                20 segments.gen
>> >> 21.07.2011  15:37               524 segments_2acy6
>> >> 21.07.2011  13:58                 0 write.lock
>> >> 16.07.2011  02:21    33'445'798'886 _52aho.fdt
>> >> 16.07.2011  02:21       178'723'932 _52aho.fdx
>> >> 16.07.2011  01:58             5'002 _52aho.fnm
>> >> 16.07.2011  03:10     9'857'410'889 _52aho.frq
>> >> 16.07.2011  03:10     4'538'234'846 _52aho.prx
>> >> 16.07.2011  03:10        61'581'767 _52aho.tii
>> >> 16.07.2011  03:10     5'505'039'790 _52aho.tis
>> >> 21.07.2011  15:23         1'999'945 _52aho_6.del
>> >> 21.07.2011  15:31     5'194'848'138 _7e3hy.fdt
>> >> 21.07.2011  15:31        28'613'668 _7e3hy.fdx
>> >> 21.07.2011  15:25             5'002 _7e3hy.fnm
>> >> 21.07.2011  15:37     1'529'771'296 _7e3hy.frq
>> >> 21.07.2011  15:37       726'582'244 _7e3hy.prx
>> >> 21.07.2011  15:37         8'518'198 _7e3hy.tii
>> >> 21.07.2011  15:37       763'213'144 _7e3hy.tis
>> >>              18 File(s) 61'840'347'291 bytes
>> >>
>> >> as you can see, size on disk did not really change. at this point I
>> >> optimize down to 1 segment and at the end the index takes 48273 Mb on
>> >> disk:
>> >>
>> >> 21.07.2011  16:46                20 segments.gen
>> >> 21.07.2011  16:46               278 segments_2acy8
>> >> 21.07.2011  13:58                 0 write.lock
>> >> 21.07.2011  16:06    32'901'423'750 _7e3hz.fdt
>> >> 21.07.2011  16:06       149'582'052 _7e3hz.fdx
>> >> 21.07.2011  15:42             5'002 _7e3hz.fnm
>> >> 21.07.2011  16:46     8'608'541'177 _7e3hz.frq
>> >> 21.07.2011  16:46     4'392'616'115 _7e3hz.prx
>> >> 21.07.2011  16:46        50'571'856 _7e3hz.tii
>> >> 21.07.2011  16:46     4'515'914'658 _7e3hz.tis
>> >>              10 File(s) 50'618'654'908 bytes
>> >>
>> >>
>> >> this means that with the 1 segment optimize I was able to reclaim 10
>> >> Gb
>> > on
>> >> disk that the 2 segments optimize could not achieve.
>> >>
>> >> how can this be explained? is that a normal behavior?
>> >>
>> >> thanks,
>> >>
>> >> vince
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Simon Willnauer <simon.willnauer@googlemail.com>
>> >>
>> >>
>> >> 20.07.2011 23:11
>> >> Please respond to
>> >> java-user@lucene.apache.org
>> >>
>> >>
>> >>
>> >> To
>> >> java-user@lucene.apache.org
>> >> cc
>> >>
>> >> Subject
>> >> Re: optimize with num segments > 1 index keeps growing
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Jul 20, 2011 at 2:00 PM,  <v.sevel@lombardodier.com> wrote:
>> >>> Hi,
>> >>>
>> >>> I index several millions small documents per day. each day, I remove
>> >> some
>> >>> of the older documents to keep the index at a stable number of
>> >> documents.
>> >>> after each purge, I commit then I optimize the index. what I found
>> >>> is
>> >> that
>> >>> if I keep optimizing with max num segments = 2, then the index keeps
>> >>> growing on the disk. but as soon as I optimize with just 1 segment,
>> the
>> >>> space gets reclaimed on the disk. so, I have currently adopted the
>> >>> following strategy : every night I optimize with 2 segments, except
>> >


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message