lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: optimize with num segments > 1 index keeps growing
Date Thu, 21 Jul 2011 18:05:50 GMT
so the problem here is that you have one really big segment _52aho.*
and several smaller ones _7e0wz.*, _7e0xu.*, _7e1x5.* ....
if you optimize to 2 segmetns all the smaller segments are merged into
one but all the large segment remains untouched. This means that all
deleted documents in the large segment are not removed / freed while
if you optimized to one segment they are removed. In the single seg.
index there is no *.del file present meaning no deletes. Unless you
merge the large segment all you deleted documents are only marked as
delete but not yet removed.

simon

On Thu, Jul 21, 2011 at 5:50 PM,  <v.sevel@lombardodier.com> wrote:
> hi,
> closing after the 2 segments optimize does not change it.
> also I am running with lucene 3.1.0.
> cheers,
> vince
>
>
>
>
>
>
>
>
>
> Ian Lea <ian.lea@gmail.com>
>
>
> 21.07.2011 17:30
> Please respond to
> java-user@lucene.apache.org
>
>
>
> To
> java-user@lucene.apache.org
> cc
>
> Subject
> Re: optimize with num segments > 1 index keeps growing
>
>
>
>
>
>
> A write.lock file with timestamp of 13:58 is in all the listings. The
> first thing I'd try is to add some IndexWriter.close() calls.
>
>
> --
> Ian.
>
>
>
> On Thu, Jul 21, 2011 at 4:05 PM,  <v.sevel@lombardodier.com> wrote:
>> Hi,
>>
>> here is a concrete example.
>>
>> I am starting with an index that has 19017236 docs, which takes 58989 Mb
>> on disk:
>>
>> 21.07.2011 15:21                20 segments.gen
>> 21.07.2011 15:21             2'974 segments_2acy4
>> 21.07.2011 13:58                 0 write.lock
>> 16.07.2011  02:21    33'445'798'886 _52aho.fdt
>> 16.07.2011  02:21       178'723'932 _52aho.fdx
>> 16.07.2011  01:58             5'002 _52aho.fnm
>> 16.07.2011  03:10     9'857'410'889 _52aho.frq
>> 16.07.2011  03:10     4'538'234'846 _52aho.prx
>> 16.07.2011  03:10        61'581'767 _52aho.tii
>> 16.07.2011  03:10     5'505'039'790 _52aho.tis
>> 21.07.2011 01:01         1'899'536 _52aho_5.del
>> 21.07.2011 01:05     4'222'206'034 _6t61z.fdt
>> 21.07.2011 01:05        21'424'556 _6t61z.fdx
>> 21.07.2011 01:01             5'002 _6t61z.fnm
>> 21.07.2011 01:12     1'170'370'187 _6t61z.frq
>> 21.07.2011  01:12       598'373'388 _6t61z.prx
>> 21.07.2011  01:12         7'574'912 _6t61z.tii
>> 21.07.2011  01:12       678'766'206 _6t61z.tis
>> 21.07.2011  13:46     1'458'592'058 _7d6me.cfs
>> 21.07.2011  13:48        15'702'654 _7dhgz.cfs
>> 21.07.2011  13:52        16'800'942 _7dphm.cfs
>> 21.07.2011  13:55        16'714'431 _7dxht.cfs
>> 21.07.2011  14:24        17'505'435 _7e0wz.cfs
>> 21.07.2011  14:24         5'875'852 _7e0xu.cfs
>> 21.07.2011  14:48        18'340'470 _7e1x5.cfs
>> 21.07.2011  15:19        16'978'564 _7e3ck.cfs
>> 21.07.2011  15:21         1'208'656 _7e3hv.cfs
>> 21.07.2011  15:21            19'361 _7e3hw.cfs
>>              28 File(s) 61'855'156'350 bytes
>>
>> I am doing a delete of some of the older documents. after the delete, I
>> commit then I optimize down to 2 segments. at the end of the optimize
> the
>> index contains 18702510 docs (314727 were deleted) and it takes now
> 58975
>> Mb on disk:
>>
>> 21.07.2011  15:37                20 segments.gen
>> 21.07.2011  15:37               524 segments_2acy6
>> 21.07.2011  13:58                 0 write.lock
>> 16.07.2011  02:21    33'445'798'886 _52aho.fdt
>> 16.07.2011  02:21       178'723'932 _52aho.fdx
>> 16.07.2011  01:58             5'002 _52aho.fnm
>> 16.07.2011  03:10     9'857'410'889 _52aho.frq
>> 16.07.2011  03:10     4'538'234'846 _52aho.prx
>> 16.07.2011  03:10        61'581'767 _52aho.tii
>> 16.07.2011  03:10     5'505'039'790 _52aho.tis
>> 21.07.2011  15:23         1'999'945 _52aho_6.del
>> 21.07.2011  15:31     5'194'848'138 _7e3hy.fdt
>> 21.07.2011  15:31        28'613'668 _7e3hy.fdx
>> 21.07.2011  15:25             5'002 _7e3hy.fnm
>> 21.07.2011  15:37     1'529'771'296 _7e3hy.frq
>> 21.07.2011  15:37       726'582'244 _7e3hy.prx
>> 21.07.2011  15:37         8'518'198 _7e3hy.tii
>> 21.07.2011  15:37       763'213'144 _7e3hy.tis
>>              18 File(s) 61'840'347'291 bytes
>>
>> as you can see, size on disk did not really change. at this point I
>> optimize down to 1 segment and at the end the index takes 48273 Mb on
>> disk:
>>
>> 21.07.2011  16:46                20 segments.gen
>> 21.07.2011  16:46               278 segments_2acy8
>> 21.07.2011  13:58                 0 write.lock
>> 21.07.2011  16:06    32'901'423'750 _7e3hz.fdt
>> 21.07.2011  16:06       149'582'052 _7e3hz.fdx
>> 21.07.2011  15:42             5'002 _7e3hz.fnm
>> 21.07.2011  16:46     8'608'541'177 _7e3hz.frq
>> 21.07.2011  16:46     4'392'616'115 _7e3hz.prx
>> 21.07.2011  16:46        50'571'856 _7e3hz.tii
>> 21.07.2011  16:46     4'515'914'658 _7e3hz.tis
>>              10 File(s) 50'618'654'908 bytes
>>
>>
>> this means that with the 1 segment optimize I was able to reclaim 10 Gb
> on
>> disk that the 2 segments optimize could not achieve.
>>
>> how can this be explained? is that a normal behavior?
>>
>> thanks,
>>
>> vince
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Simon Willnauer <simon.willnauer@googlemail.com>
>>
>>
>> 20.07.2011 23:11
>> Please respond to
>> java-user@lucene.apache.org
>>
>>
>>
>> To
>> java-user@lucene.apache.org
>> cc
>>
>> Subject
>> Re: optimize with num segments > 1 index keeps growing
>>
>>
>>
>>
>>
>>
>> On Wed, Jul 20, 2011 at 2:00 PM,  <v.sevel@lombardodier.com> wrote:
>>> Hi,
>>>
>>> I index several millions small documents per day. each day, I remove
>> some
>>> of the older documents to keep the index at a stable number of
>> documents.
>>> after each purge, I commit then I optimize the index. what I found is
>> that
>>> if I keep optimizing with max num segments = 2, then the index keeps
>>> growing on the disk. but as soon as I optimize with just 1 segment, the
>>> space gets reclaimed on the disk. so, I have currently adopted the
>>> following strategy : every night I optimize with 2 segments, except
> once
>>> per week where I optimize with just 1 segment.
>>
>> what do you mean by keeps growing. you have n segments and you
>> optimize down to 2 and the index is bigger than the one with n
>> segments?
>>
>> simon
>>>
>>> is that an expected behavior?
>>> I guess I am doing something special because I was not able to
> reproduce
>>> this behavior in a unit test. what could it be?
>>>
>>> it would be nice to get some explanatory services within the product to
>>> help get some understanding on its behavior. something that tells you
>> some
>>> information about your index for instance (number of docs in the
>> different
>>> states, how the space is being used, ...). lucene is a wonderful
>> product,
>>> but to me this is almost like black magic, and when there is a specific
>>> behavior, I have got little clues to figure out something by myself.
>> some
>>> user oriented logging would be nice as well (the index writer info
>> stream
>>> is really verbose and very low level).
>>>
>>> thanks for your help,
>>>
>>>
>>> Vince
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
> ************************ DISCLAIMER ************************
> This message is intended only for use by the person to
> whom it is addressed. It may contain information that is
> privileged and confidential. Its content does not
> constitute a formal commitment by Lombard Odier
> Darier Hentsch & Cie or any of its branches or affiliates.
> If you are not the intended recipient of this message,
> kindly notify the sender immediately and destroy this
> message. Thank You.
> *****************************************************************
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message