lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: optimize with num segments > 1 index keeps growing
Date Tue, 13 Sep 2011 12:53:49 GMT
Excellent!

OK I opened https://issues.apache.org/jira/browse/LUCENE-3432

Mike McCandless

http://blog.mikemccandless.com

On Tue, Sep 13, 2011 at 8:03 AM,  <v.sevel@lombardodier.com> wrote:
> OK. that worked.
> thanks,
> vincent
>
>
>
>
>
>
>
>
>
>
> Michael McCandless <lucene@mikemccandless.com>
>
>
> 13.09.2011 12:44
> Please respond to
> java-user@lucene.apache.org
>
>
>
> To
> java-user@lucene.apache.org
> cc
>
> Subject
> Re: optimize with num segments > 1 index keeps growing
>
>
>
>
>
>
> OK thanks for the infoStream output -- it was very helpful!
>
> It looks like you have a single large segment that has deletions... it
> could be it's over the max merge size.  Can you try setting
> tmp.setMaxMergedSegmentMB to something very large and see if the
> expunge then runs?
>
> I think TMP shouldn't enforce this max during expunge, ie, it should
> always merge if the seg has too many deletions.  I'll open an issue...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Sep 12, 2011 at 2:10 PM,  <v.sevel@lombardodier.com> wrote:
>> Hi,
>>
>> here is the code:
>>
>>            writer.commit(); // make sure nothing is buffered
>>            mgr.printIndexState("Expunging deletes using " + writer
>> .getConfig().getMergePolicy());
>>            setDirectLogger(); // redirect infoStream toward log4j
>>            writer.expungeDeletes();
>>            writer.commit();
>>            mgr.printIndexState("Expunging deletes done");
>>
>>    public void printIndexState(String context) throws IOException {
>>        IndexReader reader = openReader();
>>
>>        try {
>>            log.info(context + ": numDocs=" + reader.numDocs() + ";
>> numDeletedDocs=" + reader.numDeletedDocs() + "; indexSize=" +
>> getIndexSizeMb() + "Mb");
>>        } finally {
>>            reader.close();
>>        }
>>    }
>>
>>    public IndexReader openReader() {
>>        try {
>>            return IndexReader.open(writer, false);
>>        } catch (IOException e) {
>>            log.warn("unable to open reader", e);
>>            throw new RuntimeException("unable to open reader", e);
>>        }
>>    }
>>
>> here is the log:
>>
>> 2011-09-12 20:07:02,009 INFO  [com.lodh.arte.logserver.LuceneMgr]
>> (LogServer optimize) Expunging deletes using [TieredMergePolicy:
>> maxMergeAtOnce=10, maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0,
>> floorSegmentMB=2.0, expungeDeletesPctAllowed=0.0, segmentsPerTier=10.0,
>> useCompoundFile=true, noCFSRatio=0.1: numDocs=25077714;
>> numDeletedDocs=4340449; indexSize=71324Mb
>> 2011-09-12 20:07:02,010 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IFD [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: setInfoStream
>>
> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@5367839e
>> 2011-09-12 20:07:02,010 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]:
>>
> dir=org.apache.lucene.store.MMapDirectory@F:\logserver\index\INFRA-LOGSERVER1_UNIV_UNIV_DBIZ
>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@30f224d9
>> index=_m5eqf(3.3):C19380892/4340449 _m5eqg(3.3):C9760743
> _m63c4(3.3):c1228
>> _m627z(3.3):c259956 _m637u(3.3):c7630 _m63e2(3.3):c873 _m63ae(3.3):c932
>> _m63ah(3.3):c936 _m63aj(3.3):c449 _m63ec(3.3):c570 _m63em(3.3):c886
>> _m63ce(3.3):c708 _m63ds(3.3):c705 _m63c2(3.3):c1174 _m63di(3.3):c481
>> version=3.3.0 1139782 - 2011-06-26 09:17:59
>> matchVersion=LUCENE_31
>> analyzer=org.apache.lucene.analysis.PerFieldAnalyzerWrapper
>> delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
>> commit=null
>> openMode=CREATE_OR_APPEND
>> similarity=org.apache.lucene.search.DefaultSimilarity
>> termIndexInterval=128
>> mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler
>> default WRITE_LOCK_TIMEOUT=1000
>> writeLockTimeout=1000
>> maxBufferedDeleteTerms=-1
>> ramBufferSizeMB=16.0
>> maxBufferedDocs=-1
>> mergedSegmentWarmer=null
>> mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10,
>> maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0,
> floorSegmentMB=2.0,
>> expungeDeletesPctAllowed=0.0, segmentsPerTier=10.0,
> useCompoundFile=true,
>> noCFSRatio=0.1
>> maxThreadStates=8
>> readerPooling=false
>> readerTermsIndexDivisor=1
>>
>> 2011-09-12 20:07:02,010 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: now trigger flush reason=explicit flush
>> 2011-09-12 20:07:02,010 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]:   start flush: applyAllDeletes=true
>> 2011-09-12 20:07:02,010 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]:   index before flush _m5eqf(3.3):C19380892/4340449
>> _m5eqg(3.3):C9760743 _m63c4(3.3):c1228 _m627z(3.3):c259956
>> _m637u(3.3):c7630 _m63e2(3.3):c873 _m63ae(3.3):c932 _m63ah(3.3):c936
>> _m63aj(3.3):c449 _m63ec(3.3):c570 _m63em(3.3):c886 _m63ce(3.3):c708
>> _m63ds(3.3):c705 _m63c2(3.3):c1174 _m63di(3.3):c481
>> 2011-09-12 20:07:02,010 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: DW: flush: no docs; skipping
>> 2011-09-12 20:07:02,010 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: apply all deletes during flush
>> 2011-09-12 20:07:02,010 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE BD 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: applyDeletes: no deletes; skipping
>> 2011-09-12 20:07:02,010 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE BD 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: prune sis=org.apache.lucene.index.SegmentInfos@2ab94ec7
>> minGen=182 packetCount=0
>> 2011-09-12 20:07:02,010 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: clearFlushPending
>> 2011-09-12 20:07:02,010 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: expungeDeletes: index now _m5eqf(3.3):C19380892/4340449
>> _m5eqg(3.3):C9760743 _m63c4(3.3):c1228 _m627z(3.3):c259956
>> _m637u(3.3):c7630 _m63e2(3.3):c873 _m63ae(3.3):c932 _m63ah(3.3):c936
>> _m63aj(3.3):c449 _m63ec(3.3):c570 _m63em(3.3):c886 _m63ce(3.3):c708
>> _m63ds(3.3):c705 _m63c2(3.3):c1174 _m63di(3.3):c481
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP: findMergesToExpungeDeletes
>> infos=_m5eqf(3.3):C19380892/4340449 _m5eqg(3.3):C9760743
> _m63c4(3.3):c1228
>> _m627z(3.3):c259956 _m637u(3.3):c7630 _m63e2(3.3):c873 _m63ae(3.3):c932
>> _m63ah(3.3):c936 _m63aj(3.3):c449 _m63ec(3.3):c570 _m63em(3.3):c886
>> _m63ce(3.3):c708 _m63ds(3.3):c705 _m63c2(3.3):c1174 _m63di(3.3):c481
>> expungeDeletesPctAllowed=0.0
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP: eligible=[_m5eqf(3.3):C19380892/4340449]
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: CMS: now merge
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: CMS:   index: _m5eqf(3.3):C19380892/4340449
>> _m5eqg(3.3):C9760743 _m63c4(3.3):c1228 _m627z(3.3):c259956
>> _m637u(3.3):c7630 _m63e2(3.3):c873 _m63ae(3.3):c932 _m63ah(3.3):c936
>> _m63aj(3.3):c449 _m63ec(3.3):c570 _m63em(3.3):c886 _m63ce(3.3):c708
>> _m63ds(3.3):c705 _m63c2(3.3):c1174 _m63di(3.3):c481
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: CMS:   no more merges pending; now return
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: commit: start
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: commit: enter lock
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: commit: now prepare
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: prepareCommit: flush
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: now trigger flush reason=explicit flush
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]:   start flush: applyAllDeletes=true
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]:   index before flush _m5eqf(3.3):C19380892/4340449
>> _m5eqg(3.3):C9760743 _m63c4(3.3):c1228 _m627z(3.3):c259956
>> _m637u(3.3):c7630 _m63e2(3.3):c873 _m63ae(3.3):c932 _m63ah(3.3):c936
>> _m63aj(3.3):c449 _m63ec(3.3):c570 _m63em(3.3):c886 _m63ce(3.3):c708
>> _m63ds(3.3):c705 _m63c2(3.3):c1174 _m63di(3.3):c481
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: DW: flush: no docs; skipping
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: apply all deletes during flush
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE BD 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: applyDeletes: no deletes; skipping
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE BD 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: prune sis=org.apache.lucene.index.SegmentInfos@2ab94ec7
>> minGen=182 packetCount=0
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: clearFlushPending
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: startCommit(): start
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]:   skip startCommit(): no changes pending
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: commit: pendingCommit == null; skip
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: commit: done
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: flush at getReader
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: now trigger flush reason=explicit flush
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]:   start flush: applyAllDeletes=false
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]:   index before flush _m5eqf(3.3):C19380892/4340449
>> _m5eqg(3.3):C9760743 _m63c4(3.3):c1228 _m627z(3.3):c259956
>> _m637u(3.3):c7630 _m63e2(3.3):c873 _m63ae(3.3):c932 _m63ah(3.3):c936
>> _m63aj(3.3):c449 _m63ec(3.3):c570 _m63em(3.3):c886 _m63ce(3.3):c708
>> _m63ds(3.3):c705 _m63c2(3.3):c1174 _m63di(3.3):c481
>> 2011-09-12 20:07:02,011 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: DW: flush: no docs; skipping
>> 2011-09-12 20:07:02,012 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: don't apply deletes now delTermCount=0 bytesUsed=0
>> 2011-09-12 20:07:02,012 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: clearFlushPending
>> 2011-09-12 20:07:02,013 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: return reader version=1305975958585
>> reader=ReadOnlyDirectoryReader(segments_4d755:nrt
>> _m5eqf(3.3):C19380892/4340449 _m5eqg(3.3):C9760743 _m63c4(3.3):c1228
>> _m627z(3.3):c259956 _m637u(3.3):c7630 _m63e2(3.3):c873 _m63ae(3.3):c932
>> _m63ah(3.3):c936 _m63aj(3.3):c449 _m63ec(3.3):c570 _m63em(3.3):c886
>> _m63ce(3.3):c708 _m63ds(3.3):c705 _m63c2(3.3):c1174 _m63di(3.3):c481)
>> 2011-09-12 20:07:02,013 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP: findMerges: 15 segments
>> 2011-09-12 20:07:02,013 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m5eqf(3.3):C19380892/4340449 size=39901.436 MB
>> [skip: too large]
>> 2011-09-12 20:07:02,013 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m5eqg(3.3):C9760743 size=19338.310 MB [skip: too
>> large]
>> 2011-09-12 20:07:02,014 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m627z(3.3):c259956 size=531.258 MB
>> 2011-09-12 20:07:02,014 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m637u(3.3):c7630 size=17.103 MB
>> 2011-09-12 20:07:02,014 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m63c2(3.3):c1174 size=3.026 MB
>> 2011-09-12 20:07:02,014 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m63ah(3.3):c936 size=2.663 MB
>> 2011-09-12 20:07:02,014 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m63ae(3.3):c932 size=2.659 MB
>> 2011-09-12 20:07:02,014 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m63c4(3.3):c1228 size=2.084 MB
>> 2011-09-12 20:07:02,014 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m63em(3.3):c886 size=1.771 MB [floored]
>> 2011-09-12 20:07:02,014 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m63ec(3.3):c570 size=1.682 MB [floored]
>> 2011-09-12 20:07:02,014 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m63ce(3.3):c708 size=1.671 MB [floored]
>> 2011-09-12 20:07:02,014 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m63di(3.3):c481 size=1.653 MB [floored]
>> 2011-09-12 20:07:02,014 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m63e2(3.3):c873 size=1.548 MB [floored]
>> 2011-09-12 20:07:02,014 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m63ds(3.3):c705 size=1.404 MB [floored]
>> 2011-09-12 20:07:02,015 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   seg=_m63aj(3.3):c449 size=1.389 MB [floored]
>> 2011-09-12 20:07:02,015 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: TMP:   allowedSegmentCount=22 vs count=15 (eligible count=13)
>> tooBigCount=2
>> 2011-09-12 20:07:02,015 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: CMS: now merge
>> 2011-09-12 20:07:02,015 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: CMS:   index: _m5eqf(3.3):C19380892/4340449
>> _m5eqg(3.3):C9760743 _m63c4(3.3):c1228 _m627z(3.3):c259956
>> _m637u(3.3):c7630 _m63e2(3.3):c873 _m63ae(3.3):c932 _m63ah(3.3):c936
>> _m63aj(3.3):c449 _m63ec(3.3):c570 _m63em(3.3):c886 _m63ce(3.3):c708
>> _m63ds(3.3):c705 _m63c2(3.3):c1174 _m63di(3.3):c481
>> 2011-09-12 20:07:02,015 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: CMS:   no more merges pending; now return
>> 2011-09-12 20:07:02,015 INFO  [com.lodh.arte.logserver.LuceneWriter]
>> (LogServer optimize) LUCENE IW 0 [Mon Sep 12 20:07:02 CEST 2011;
> LogServer
>> optimize]: getReader took 4 msec
>> 2011-09-12 20:07:02,016 INFO  [com.lodh.arte.logserver.LuceneMgr]
>> (LogServer optimize) Expunging deletes done: numDocs=25077714;
>> numDeletedDocs=4340449; indexSize=71324Mb
>>
>> regards,
>>
>> vincent
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Michael McCandless <lucene@mikemccandless.com>
>>
>>
>> 12.09.2011 18:20
>> Please respond to
>> java-user@lucene.apache.org
>>
>>
>>
>> To
>> java-user@lucene.apache.org
>> cc
>>
>> Subject
>> Re: optimize with num segments > 1 index keeps growing
>>
>>
>>
>>
>>
>>
>> Hmm... are you using IndexReader.numDeletedDocs to check?
>>
>> Did you commit from the writer and then reopen the IndexReader before
>> calling .numDeletedDocs?  Else the reader won't see the change.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Sat, Sep 10, 2011 at 11:58 PM,  <v.sevel@lombardodier.com> wrote:
>>> Hi, even with setExpungeDeletesPctAllowed(0.0), I could not get docs to
>>> get removed from disk.
>>> after the expunge+commit I print again the numDeletedDocs, and it stays
>>> the same.
>>> regards,
>>> vincent
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Michael McCandless <lucene@mikemccandless.com>
>>>
>>>
>>> 09.09.2011 20:53
>>> Please respond to
>>> java-user@lucene.apache.org
>>>
>>>
>>>
>>> To
>>> java-user@lucene.apache.org
>>> cc
>>>
>>> Subject
>>> Re: optimize with num segments > 1 index keeps growing
>>>
>>>
>>>
>>>
>>>
>>>
>>> TieredMergePolicy by default will only merge a segment if it has > 10%
>>> deletions.
>>>
>>> Can you try calling .setExpungeDeletesPctAllowed(0.0) and then expunge
>>> again?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Fri, Sep 9, 2011 at 1:41 PM,  <v.sevel@lombardodier.com> wrote:
>>>> Hi,
>>>>
>>>> this post is quite old, but I would like to share some recen
>>> developments.
>>>>
>>>> I applied the recommandation. my process became: expunge deletes and
>>>> optimize 2 segments.
>>>>
>>>> at the time I was with lucene 3.1 and that solved my issue. recently I
>>>> moved to lucene 3.3, and I tried playing with the new tiered merge
>>> policy.
>>>> what I found was that after an expunge, the number of deleted docs
>> would
>>>> stay the same, and space would not be reclaimed on the disk. I
> switched
>>>> back to the default merge policy (LogByteSizeMergePolicy:
>>>> minMergeSize=1677721, mergeFactor=10, maxMergeSize=2147483648,
>>>> maxMergeSizeForOptimize=9223372036854775807,
>>> calibrateSizeByDeletes=true,
>>>> maxMergeDocs=2147483647, useCompoundFile=true, noCFSRatio=0.1) and got
>>>> this time the right behavior : size was reclaimed on disk. I even
> tried
>>>> with the BalancedSegmentMergePolicy and got again the right behavior.
>>>>
>>>> so this issue seems to affect only the tiered merge policy.
>>>>
>>>> to illustrate this, I took an index with many deleted docs then
>>>> expunged/optimized while using the tiered policy, then did the same
>>> thing
>>>> with a default merge policy. here is for each step the content of the
>>>> directory:
>>>>
>>>> before:
>>>>
>>>> 09.09.2011  17:38                20 segments.gen
>>>> 09.09.2011  17:38             5'335 segments_4bf1u
>>>> 06.09.2011  15:27                 0 write.lock
>>>> 06.09.2011  00:49    31'681'157'794 _jhwld.fdt
>>>> 06.09.2011  00:49       115'562'268 _jhwld.fdx
>>>> 06.09.2011  00:37             5'347 _jhwld.fnm
>>>> 06.09.2011  01:13     7'147'947'472 _jhwld.frq
>>>> 06.09.2011  01:13     3'927'649'164 _jhwld.prx
>>>> 06.09.2011  01:13        41'992'760 _jhwld.tii
>>>> 06.09.2011  01:13     3'745'729'056 _jhwld.tis
>>>> 09.09.2011  00:27         1'805'669 _jhwld_3.del
>>>> 09.09.2011  00:31    11'397'619'448 _jtrwg.fdt
>>>> 09.09.2011  00:31        98'393'316 _jtrwg.fdx
>>>> 09.09.2011  00:27             5'347 _jtrwg.fnm
>>>> 09.09.2011  00:47     5'146'273'732 _jtrwg.frq
>>>> 09.09.2011  00:47     1'661'436'146 _jtrwg.prx
>>>> 09.09.2011  00:47        23'950'194 _jtrwg.tii
>>>> 09.09.2011  00:47     2'139'903'139 _jtrwg.tis
>>>> 09.09.2011  07:39        94'471'867 _jugaa.cfs
>>>> 09.09.2011  10:14       252'716'611 _juok2.cfs
>>>> 09.09.2011  15:45         7'986'102 _jwuaq.cfs
>>>> 09.09.2011  16:00         5'780'703 _jx45g.cfs
>>>> 09.09.2011  16:00       333'981'384 _jx46a.cfs
>>>> 09.09.2011  16:23        20'955'761 _jxge0.cfs
>>>> 09.09.2011  16:46        19'258'025 _jxmas.cfs
>>>> 09.09.2011  16:55        16'622'800 _jxpv4.cfs
>>>> 09.09.2011  17:10        14'605'028 _jxvd6.cfs
>>>> 09.09.2011  17:34        12'456'476 _jy28o.cfs
>>>> 09.09.2011  17:38         2'584'950 _jy91y.cfs
>>>> 09.09.2011  17:38         2'595'049 _jy92i.cfs
>>>> 09.09.2011  17:38         2'600'991 _jy932.cfs
>>>> 09.09.2011  17:38         2'610'278 _jy93m.cfs
>>>> 09.09.2011  17:38            46'664 _jy93x.cfs
>>>> 09.09.2011  17:38             9'765 _jy93y.cfs
>>>> 09.09.2011  17:38            10'691 _jy93z.cfs
>>>> 09.09.2011  17:38             9'533 _jy940.cfs
>>>> 09.09.2011  17:38            11'684 _jy941.cfs
>>>> 09.09.2011  17:38             8'996 _jy942.cfs
>>>>              38 File(s) 67'918'759'565 bytes
>>>>
>>>>
>>>> after expunge/optimize (tiered merge policy):
>>>>
>>>> 09.09.2011  18:02                20 segments.gen
>>>> 09.09.2011  18:02             3'171 segments_4bf3g
>>>> 06.09.2011  15:27                 0 write.lock
>>>> 06.09.2011  00:49    31'681'157'794 _jhwld.fdt
>>>> 06.09.2011  00:49       115'562'268 _jhwld.fdx
>>>> 06.09.2011  00:37             5'347 _jhwld.fnm
>>>> 06.09.2011  01:13     7'147'947'472 _jhwld.frq
>>>> 06.09.2011  01:13     3'927'649'164 _jhwld.prx
>>>> 06.09.2011  01:13        41'992'760 _jhwld.tii
>>>> 06.09.2011  01:13     3'745'729'056 _jhwld.tis
>>>> 09.09.2011  17:39         1'805'669 _jhwld_4.del
>>>> 09.09.2011  17:45    11'814'367'373 _jy9iy.fdt
>>>> 09.09.2011  17:45       101'565'036 _jy9iy.fdx
>>>> 09.09.2011  17:39             5'347 _jy9iy.fnm
>>>> 09.09.2011  18:01     5'328'530'169 _jy9iy.frq
>>>> 09.09.2011  18:01     1'733'490'572 _jy9iy.prx
>>>> 09.09.2011  18:01        25'072'713 _jy9iy.tii
>>>> 09.09.2011  18:01     2'239'702'399 _jy9iy.tis
>>>> 09.09.2011  18:02           185'962 _jy9mv.cfs
>>>> 09.09.2011  18:02             9'955 _jy9mw.cfs
>>>> 09.09.2011  18:02            10'380 _jy9mx.cfs
>>>> 09.09.2011  18:02             9'341 _jy9my.cfs
>>>> 09.09.2011  18:02             9'228 _jy9mz.cfs
>>>> 09.09.2011  18:02            10'382 _jy9n0.cfs
>>>> 09.09.2011  18:02             9'345 _jy9n1.cfs
>>>> 09.09.2011  18:02             9'231 _jy9n2.cfs
>>>> 09.09.2011  18:02             8'961 _jy9n3.cfs
>>>> 09.09.2011  18:02            10'381 _jy9n4.cfs
>>>> 09.09.2011  18:02           199'651 _jy9n5.cfs
>>>> 09.09.2011  18:02             9'345 _jy9n6.cfs
>>>> 09.09.2011  18:02             9'230 _jy9n7.cfs
>>>>              31 File(s) 67'905'077'722 bytes
>>>>
>>>> after expungeDeletes/optimize with default merge policy :
>>>>
>>>> 09.09.2011  19:31                20 segments.gen
>>>> 09.09.2011  19:31             2'081 segments_4bfpe
>>>> 09.09.2011  18:13                 0 write.lock
>>>> 09.09.2011  18:42    30'133'772'814 _jyb4c.fdt
>>>> 09.09.2011  18:42       103'164'812 _jyb4c.fdx
>>>> 09.09.2011  18:27             5'347 _jyb4c.fnm
>>>> 09.09.2011  19:03     6'474'023'590 _jyb4c.frq
>>>> 09.09.2011  19:03     3'699'406'141 _jyb4c.prx
>>>> 09.09.2011  19:03        37'900'657 _jyb4c.tii
>>>> 09.09.2011  19:03     3'380'266'875 _jyb4c.tis
>>>> 09.09.2011  19:15    11'820'477'088 _jyb4e.fdt
>>>> 09.09.2011  19:15       101'659'700 _jyb4e.fdx
>>>> 09.09.2011  19:03             5'347 _jyb4e.fnm
>>>> 09.09.2011  19:29     5'333'219'797 _jyb4e.frq
>>>> 09.09.2011  19:29     1'734'633'179 _jyb4e.prx
>>>> 09.09.2011  19:29        25'105'023 _jyb4e.tii
>>>> 09.09.2011  19:29     2'242'558'333 _jyb4e.tis
>>>> 09.09.2011  19:31           223'600 _jyb5t.cfs
>>>> 09.09.2011  19:31             9'545 _jyb5u.cfs
>>>> 09.09.2011  19:31             8'963 _jyb5v.cfs
>>>> 09.09.2011  19:31             9'250 _jyb5w.cfs
>>>> 09.09.2011  19:31             9'047 _jyb5x.cfs
>>>> 09.09.2011  19:31            11'253 _jyb5y.cfs
>>>> 09.09.2011  19:31            11'239 _jyb5z.cfs
>>>>              24 File(s) 65'086'483'701 bytes
>>>>
>>>> any clue to what is happenning?
>>>>
>>>> thanks,
>>>>
>>>>
>>>> Vincent
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> "Uwe Schindler" <uwe@thetaphi.de>
>>>>
>>>>
>>>> 21.07.2011 22:46
>>>> Please respond to
>>>> java-user@lucene.apache.org
>>>>
>>>>
>>>>
>>>> To
>>>> <java-user@lucene.apache.org>
>>>> cc
>>>>
>>>> Subject
>>>> RE: optimize with num segments > 1 index keeps growing
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> There is also expungeDeletes()...
>>>>
>>>> -----
>>>> Uwe Schindler
>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>> http://www.thetaphi.de
>>>> eMail: uwe@thetaphi.de
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: v.sevel@lombardodier.com [mailto:v.sevel@lombardodier.com]
>>>>> Sent: Thursday, July 21, 2011 8:39 PM
>>>>> To: java-user@lucene.apache.org
>>>>> Subject: Re: optimize with num segments > 1 index keeps growing
>>>>>
>>>>> Hi, thanks for this explanation.
>>>>> so what is the best solution: merge the large segment (how can I do
>>>> that)
>>>> or
>>>>> work with many segments (10?) so that I will avoid have this "large
>>>> segment"
>>>>> issue?
>>>>> thanks,
>>>>> vince
>>>>>
>>>>>
>>>>> Vincent Sevel
>>>>> Lombard Odier Darier Hentsch & Cie
>>>>> 11, rue de la Corraterie - 1204 Genève - Suisse T +41 22 709 3376 - F
>>>> +41
>>>> 22 709
>>>>> 3782 www.lombardodier.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Simon Willnauer <simon.willnauer@googlemail.com>
>>>>>
>>>>>
>>>>> 21.07.2011 20:06
>>>>> Please respond to
>>>>> java-user@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>> To
>>>>> java-user@lucene.apache.org
>>>>> cc
>>>>>
>>>>> Subject
>>>>> Re: optimize with num segments > 1 index keeps growing
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> so the problem here is that you have one really big segment _52aho.*
>>> and
>>>>> several smaller ones _7e0wz.*, _7e0xu.*, _7e1x5.* ....
>>>>> if you optimize to 2 segmetns all the smaller segments are merged
> into
>>>> one
>>>>> but all the large segment remains untouched. This means that all
>>> deleted
>>>>> documents in the large segment are not removed / freed while if you
>>>>> optimized to one segment they are removed. In the single seg.
>>>>> index there is no *.del file present meaning no deletes. Unless you
>>>> merge
>>>>> the large segment all you deleted documents are only marked as delete
>>>> but
>>>>> not yet removed.
>>>>>
>>>>> simon
>>>>>
>>>>> On Thu, Jul 21, 2011 at 5:50 PM,  <v.sevel@lombardodier.com> wrote:
>>>>> > hi,
>>>>> > closing after the 2 segments optimize does not change it.
>>>>> > also I am running with lucene 3.1.0.
>>>>> > cheers,
>>>>> > vince
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > Ian Lea <ian.lea@gmail.com>
>>>>> >
>>>>> >
>>>>> > 21.07.2011 17:30
>>>>> > Please respond to
>>>>> > java-user@lucene.apache.org
>>>>> >
>>>>> >
>>>>> >
>>>>> > To
>>>>> > java-user@lucene.apache.org
>>>>> > cc
>>>>> >
>>>>> > Subject
>>>>> > Re: optimize with num segments > 1 index keeps growing
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > A write.lock file with timestamp of 13:58 is in all the listings.
>> The
>>>>> > first thing I'd try is to add some IndexWriter.close() calls.
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Ian.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Thu, Jul 21, 2011 at 4:05 PM,  <v.sevel@lombardodier.com> wrote:
>>>>> >> Hi,
>>>>> >>
>>>>> >> here is a concrete example.
>>>>> >>
>>>>> >> I am starting with an index that has 19017236 docs, which takes
>>> 58989
>>>>> Mb
>>>>> >> on disk:
>>>>> >>
>>>>> >> 21.07.2011 15:21                20 segments.gen
>>>>> >> 21.07.2011 15:21             2'974 segments_2acy4
>>>>> >> 21.07.2011 13:58                 0 write.lock
>>>>> >> 16.07.2011  02:21    33'445'798'886 _52aho.fdt
>>>>> >> 16.07.2011  02:21       178'723'932 _52aho.fdx
>>>>> >> 16.07.2011  01:58             5'002 _52aho.fnm
>>>>> >> 16.07.2011  03:10     9'857'410'889 _52aho.frq
>>>>> >> 16.07.2011  03:10     4'538'234'846 _52aho.prx
>>>>> >> 16.07.2011  03:10        61'581'767 _52aho.tii
>>>>> >> 16.07.2011  03:10     5'505'039'790 _52aho.tis
>>>>> >> 21.07.2011 01:01         1'899'536 _52aho_5.del
>>>>> >> 21.07.2011 01:05     4'222'206'034 _6t61z.fdt
>>>>> >> 21.07.2011 01:05        21'424'556 _6t61z.fdx
>>>>> >> 21.07.2011 01:01             5'002 _6t61z.fnm
>>>>> >> 21.07.2011 01:12     1'170'370'187 _6t61z.frq
>>>>> >> 21.07.2011  01:12       598'373'388 _6t61z.prx
>>>>> >> 21.07.2011  01:12         7'574'912 _6t61z.tii
>>>>> >> 21.07.2011  01:12       678'766'206 _6t61z.tis
>>>>> >> 21.07.2011  13:46     1'458'592'058 _7d6me.cfs
>>>>> >> 21.07.2011  13:48        15'702'654 _7dhgz.cfs
>>>>> >> 21.07.2011  13:52        16'800'942 _7dphm.cfs
>>>>> >> 21.07.2011  13:55        16'714'431 _7dxht.cfs
>>>>> >> 21.07.2011  14:24        17'505'435 _7e0wz.cfs
>>>>> >> 21.07.2011  14:24         5'875'852 _7e0xu.cfs
>>>>> >> 21.07.2011  14:48        18'340'470 _7e1x5.cfs
>>>>> >> 21.07.2011  15:19        16'978'564 _7e3ck.cfs
>>>>> >> 21.07.2011  15:21         1'208'656 _7e3hv.cfs
>>>>> >> 21.07.2011  15:21            19'361 _7e3hw.cfs
>>>>> >>              28 File(s) 61'855'156'350 bytes
>>>>> >>
>>>>> >> I am doing a delete of some of the older documents. after the
>>> delete,
>>>>> >> I commit then I optimize down to 2 segments. at the end of the
>>>>> >> optimize
>>>>> > the
>>>>> >> index contains 18702510 docs (314727 were deleted) and it takes
> now
>>>>> > 58975
>>>>> >> Mb on disk:
>>>>> >>
>>>>> >> 21.07.2011  15:37                20 segments.gen
>>>>> >> 21.07.2011  15:37               524 segments_2acy6
>>>>> >> 21.07.2011  13:58                 0 write.lock
>>>>> >> 16.07.2011  02:21    33'445'798'886 _52aho.fdt
>>>>> >> 16.07.2011  02:21       178'723'932 _52aho.fdx
>>>>> >> 16.07.2011  01:58             5'002 _52aho.fnm
>>>>> >> 16.07.2011  03:10     9'857'410'889 _52aho.frq
>>>>> >> 16.07.2011  03:10     4'538'234'846 _52aho.prx
>>>>> >> 16.07.2011  03:10        61'581'767 _52aho.tii
>>>>> >> 16.07.2011  03:10     5'505'039'790 _52aho.tis
>>>>> >> 21.07.2011  15:23         1'999'945 _52aho_6.del
>>>>> >> 21.07.2011  15:31     5'194'848'138 _7e3hy.fdt
>>>>> >> 21.07.2011  15:31        28'613'668 _7e3hy.fdx
>>>>> >> 21.07.2011  15:25             5'002 _7e3hy.fnm
>>>>> >> 21.07.2011  15:37     1'529'771'296 _7e3hy.frq
>>>>> >> 21.07.2011  15:37       726'582'244 _7e3hy.prx
>>>>> >> 21.07.2011  15:37         8'518'198 _7e3hy.tii
>>>>> >> 21.07.2011  15:37       763'213'144 _7e3hy.tis
>>>>> >>              18 File(s) 61'840'347'291 bytes
>>>>> >>
>>>>> >> as you can see, size on disk did not really change. at this point
> I
>>>>> >> optimize down to 1 segment and at the end the index takes 48273 Mb
>>> on
>>>>> >> disk:
>>>>> >>
>>>>> >> 21.07.2011  16:46                20 segments.gen
>>>>> >> 21.07.2011  16:46               278 segments_2acy8
>>>>> >> 21.07.2011  13:58                 0 write.lock
>>>>> >> 21.07.2011  16:06    32'901'423'750 _7e3hz.fdt
>>>>> >> 21.07.2011  16:06       149'582'052 _7e3hz.fdx
>>>>> >> 21.07.2011  15:42             5'002 _7e3hz.fnm
>>>>> >> 21.07.2011  16:46     8'608'541'177 _7e3hz.frq
>>>>> >> 21.07.2011  16:46     4'392'616'115 _7e3hz.prx
>>>>> >> 21.07.2011  16:46        50'571'856 _7e3hz.tii
>>>>> >> 21.07.2011  16:46     4'515'914'658 _7e3hz.tis
>>>>> >>              10 File(s) 50'618'654'908 bytes
>>>>> >>
>>>>> >>
>>>>> >> this means that with the 1 segment optimize I was able to reclaim
>> 10
>>>>> >> Gb
>>>>> > on
>>>>> >> disk that the 2 segments optimize could not achieve.
>>>>> >>
>>>>> >> how can this be explained? is that a normal behavior?
>>>>> >>
>>>>> >> thanks,
>>>>> >>
>>>>> >> vince
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Simon Willnauer <simon.willnauer@googlemail.com>
>>>>> >>
>>>>> >>
>>>>> >> 20.07.2011 23:11
>>>>> >> Please respond to
>>>>> >> java-user@lucene.apache.org
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> To
>>>>> >> java-user@lucene.apache.org
>>>>> >> cc
>>>>> >>
>>>>> >> Subject
>>>>> >> Re: optimize with num segments > 1 index keeps growing
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Wed, Jul 20, 2011 at 2:00 PM,  <v.sevel@lombardodier.com>
> wrote:
>>>>> >>> Hi,
>>>>> >>>
>>>>> >>> I index several millions small documents per day. each day, I
>>> remove
>>>>> >> some
>>>>> >>> of the older documents to keep the index at a stable number of
>>>>> >> documents.
>>>>> >>> after each purge, I commit then I optimize the index. what I
> found
>>>>> >>> is
>>>>> >> that
>>>>> >>> if I keep optimizing with max num segments = 2, then the index
>>> keeps
>>>>> >>> growing on the disk. but as soon as I optimize with just 1
>> segment,
>>>>> the
>>>>> >>> space gets reclaimed on the disk. so, I have currently adopted
> the
>>>>> >>> following strategy : every night I optimize with 2 segments,
>> except
>>>>> > once
>>>>> >>> per week where I optimize with just 1 segment.
>>>>> >>
>>>>> >> what do you mean by keeps growing. you have n segments and you
>>>>> >> optimize down to 2 and the index is bigger than the one with n
>>>>> >> segments?
>>>>> >>
>>>>> >> simon
>>>>> >>>
>>>>> >>> is that an expected behavior?
>>>>> >>> I guess I am doing something special because I was not able to
>>>>> > reproduce
>>>>> >>> this behavior in a unit test. what could it be?
>>>>> >>>
>>>>> >>> it would be nice to get some explanatory services within the
>>> product
>>>>> to
>>>>> >>> help get some understanding on its behavior. something that tells
>>>>> >>> you
>>>>> >> some
>>>>> >>> information about your index for instance (number of docs in the
>>>>> >> different
>>>>> >>> states, how the space is being used, ...). lucene is a wonderful
>>>>> >> product,
>>>>> >>> but to me this is almost like black magic, and when there is a
>>>>> specific
>>>>> >>> behavior, I have got little clues to figure out something by
>>> myself.
>>>>> >> some
>>>>> >>> user oriented logging would be nice as well (the index writer
> info
>>>>> >> stream
>>>>> >>> is really verbose and very low level).
>>>>> >>>
>>>>> >>> thanks for your help,
>>>>> >>>
>>>>> >>>
>>>>> >>> Vince
>>>>> >
>>>>> >
>> ---------------------------------------------------------------------
>>>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > ************************ DISCLAIMER
>>>>> ************************ This
>>>>> > message is intended only for use by the person to whom it is
>>>>> > addressed. It may contain information that is privileged and
>>>>> > confidential. Its content does not constitute a formal commitment
> by
>>>>> > Lombard Odier Darier Hentsch & Cie or any of its branches or
>>>>> > affiliates.
>>>>> > If you are not the intended recipient of this message, kindly
> notify
>>>>> > the sender immediately and destroy this message. Thank You.
>>>>> >
>>>>> **********************************************************
>>>>> *******
>>>>> >
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ************************ DISCLAIMER ************************
>>>>> This message is intended only for use by the person to whom it is
>>>> addressed.
>>>>> It may contain information that is privileged and confidential. Its
>>>> content
>>>>> does not constitute a formal commitment by Lombard Odier Darier
>> Hentsch
>>>>> & Cie or any of its branches or affiliates.
>>>>> If you are not the intended recipient of this message, kindly notify
>>> the
>>>>> sender immediately and destroy this message. Thank You.
>>>>> **********************************************************
>>>>> *******
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>> ************************ DISCLAIMER ************************
>>>> This message is intended only for use by the person to
>>>> whom it is addressed. It may contain information that is
>>>> privileged and confidential. Its content does not
>>>> constitute a formal commitment by Lombard Odier
>>>> Darier Hentsch & Cie or any of its branches or affiliates.
>>>> If you are not the intended recipient of this message,
>>>> kindly notify the sender immediately and destroy this
>>>> message. Thank You.
>>>> *****************************************************************
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>> ************************ DISCLAIMER ************************
>>> This message is intended only for use by the person to
>>> whom it is addressed. It may contain information that is
>>> privileged and confidential. Its content does not
>>> constitute a formal commitment by Lombard Odier
>>> Darier Hentsch & Cie or any of its branches or affiliates.
>>> If you are not the intended recipient of this message,
>>> kindly notify the sender immediately and destroy this
>>> message. Thank You.
>>> *****************************************************************
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>> ************************ DISCLAIMER ************************
>> This message is intended only for use by the person to
>> whom it is addressed. It may contain information that is
>> privileged and confidential. Its content does not
>> constitute a formal commitment by Lombard Odier
>> Darier Hentsch & Cie or any of its branches or affiliates.
>> If you are not the intended recipient of this message,
>> kindly notify the sender immediately and destroy this
>> message. Thank You.
>> *****************************************************************
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
> ************************ DISCLAIMER ************************
> This message is intended only for use by the person to
> whom it is addressed. It may contain information that is
> privileged and confidential. Its content does not
> constitute a formal commitment by Lombard Odier
> Darier Hentsch & Cie or any of its branches or affiliates.
> If you are not the intended recipient of this message,
> kindly notify the sender immediately and destroy this
> message. Thank You.
> *****************************************************************
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message