lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: lots of .cfs (compound files) in the index directory
Date Thu, 15 Mar 2012 15:04:34 GMT
Hmm, which OS/filesystem is the index on?  Which Directory impl are
you using...?

Indeed CheckIndex sees the 5111 segments, so something seems amiss.

The log was edited right?  EG I can't see the IW open log entries...
and IW 51 is missing (it goes straight from 50 to 52).  Can you post
the full log?  Also, can you attach it (instead of inlining it), maybe
gzip'd, so it doesn't get line-wrapped?

Finally, can you run things a big longer, so we can see the
pathological behavior emerge?

The only odd thing I see in the log is that IW seems to be attempting
to delete files (eg _1a.cfs) that are still "live", which is odd.

That the delete did not succeed could just be that the file is open
from a reader (do you have a reader open on the index?)... though, I
would have expected to see "unable to remove file..." in the
InfoStream output.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Mar 15, 2012 at 10:30 AM, Tim Bogaert <bogguard@gmail.com> wrote:
> Hi Mike,
>
> Segment format: FORMAT_3_1 [Lucene 3.1+]
> NumBadSegments: 0
> NumSegments: 5111
> Clean?: true
> TotLoseDocCount: 0
> Partial: false
>
>
> I reindexed and let the feed come in again...The logs of the result are
> below (I just copied a few) after 10 minutes :
> Segment format: FORMAT_3_1 [Lucene 3.1+]
> NumBadSegments: 0
> NumSegments: 11
> Clean?: true
> TotLoseDocCount: 0
> Partial: false
>
> What I noticed in the log was that _1a.cfs is getting deleted, but it seems
> to be still in the directory :
>
> ls -al
> -rw-r--r--   1 web      web      1444172 Mar 15 15:13 _17.fdt
> -rw-r--r--   1 web      web        88804 Mar 15 15:13 _17.fdx
> -rw-r--r--   1 web      web          142 Mar 15 15:13 _17.fnm
> -rw-r--r--   1 web      web      4352933 Mar 15 15:13 _17.frq
> -rw-r--r--   1 web      web       177604 Mar 15 15:13 _17.nrm
> -rw-r--r--   1 web      web      5931105 Mar 15 15:13 _17.prx
> -rw-r--r--   1 web      web        18159 Mar 15 15:13 _17.tii
> -rw-r--r--   1 web      web      1421338 Mar 15 15:13 _17.tis
> -rw-r--r--   1 web      web           36 Mar 15 15:14 _17_1.del
> -rw-r--r--   1 web      web       379400 Mar 15 15:13 _18.cfs
> -rw-r--r--   1 web      web       424376 Mar 15 15:13 _19.cfs
> -rw-r--r--   1 web      web       463177 Mar 15 15:13 _1a.cfs
> -rw-r--r--   1 web      web       108875 Mar 15 15:13 _1b.cfs
> -rw-r--r--   1 web      web         4481 Mar 15 15:14 _1c.cfs
> -rw-r--r--   1 web      web         4465 Mar 15 15:15 _1d.cfs
> -rw-r--r--   1 web      web         4534 Mar 15 15:17 _1e.cfs
> -rw-r--r--   1 web      web         3252 Mar 15 15:18 _1f.cfs
> -rw-r--r--   1 web      web         3314 Mar 15 15:21 _1g.cfs
> -rw-r--r--   1 web      web         2340 Mar 15 15:25 _1h.cfs
> -rw-r--r--   1 web      web         3902 Mar 15 15:27 _1i.cfs
> -rw-r--r--   1 web      web         1687 Mar 15 15:28 _1j.cfs
> -rw-r--r--   1 web      web           20 Mar 15 15:28 segments.gen
> -rw-r--r--   1 web      web         2735 Mar 15 15:28 segments_1h
>
>
> Here is (part of) the log :
>
> 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_h(3.5):c1 size=0.007 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_b(3.5):c1 size=0.007 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_g(3.5):c1 size=0.006 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_c(3.5):c1 size=0.006 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_11(3.5):c1 size=0.005 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_12(3.5):c1 size=0.005 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> seg=_6(3.5):c1 size=0.005 MB [merging] [floored]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: TMP:
> allowedSegmentCount=1 vs count=15 (eligible count=5) tooBigCount=0
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: merge time 46
> msec for 10 docs
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now flush at close
> waitForMerges=true
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: CMS:   merge
> thread: done
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now trigger flush
> reason=explicit flush
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #2]: CMS: set
> priority of merge thread Lucene Merge Thread #3 to 6
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]:   start flush:
> applyAllDeletes=true
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]:   index before flush
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
> _18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1
> _h(3.5):c1 _11(3.5):c1 _12(3.5):c1
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: DW: flush: no docs;
> skipping
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: apply all deletes
> during flush
> BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: applyDeletes: no
> deletes; skipping
> BD 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: prune
> sis=org.apache.lucene.index.SegmentInfos@1cad1e6 minGen=0 packetCount=0
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: clearFlushPending
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS: now merge
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS:   index:
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1
> _18(3.5):c10 _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1
> _h(3.5):c1 _11(3.5):c1 _12(3.5):c1
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: CMS:   no more
> merges pending; now return
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: waitForMerges
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merge store
> matchedCount=10 vs 10
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merged
> segment size=0.076 MB vs estimate=0.103 MB
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: commitMerge:
> _2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1 _b(3.5):c1 _g(3.5):c1
> _c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19 index=_0(3.5):C73
> _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _16(3.5):c10 _6(3.5):c1 _18(3.5):c10
> _17(3.5):c10 _b(3.5):c1 _c(3.5):c1 _d(3.5):c1 _g(3.5):c1 _h(3.5):c1
> _11(3.5):c1 _12(3.5):c1
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]:
> commitMergeDeletes _2(3.5):C23 _3(3.5):c7 _d(3.5):c1 _h(3.5):c1 _b(3.5):c1
> _g(3.5):c1 _c(3.5):c1 _11(3.5):c1 _12(3.5):c1 _6(3.5):c1 into _19
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: after commit:
> _0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10 _17(3.5):c10
> IFD [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: now checkpoint
> "segments_15" [6 segments ; isCommit = false]
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: merge time 61
> msec for 38 docs
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: waitForMerges done
> IW 50 [Thu Mar 15 15:25:23 MET 2012; Lucene Merge Thread #3]: CMS:   merge
> thread: done
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now call final
> commit()
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: start
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: enter lock
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: already
> prepared
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit:
> pendingCommit != null
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: wrote
> segments file "segments_16"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: now checkpoint
> "segments_16" [42 segments ; isCommit = true]
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: deleteCommits: now
> decRef commit "segments_15"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "segments_15"
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: commit: done
> IW 50 [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: at close:
> _0(3.5):C73 _1(3.5):C70 _19(3.5):C38 _16(3.5):c10 _18(3.5):c10 _17(3.5):c10
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.tii"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.tis"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_18.cfs"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.nrm"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fnm"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fdt"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_17.cfs"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.prx"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_16.cfs"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.fdx"
> IFD [Thu Mar 15 15:25:23 MET 2012; pool-2-thread-1]: delete "_19.frq"
> Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: applyDeletes: no deletes;
> skipping
> BD 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: prune
> sis=org.apache.lucene.index.SegmentInfos@59d9fe minGen=0 packetCount=0
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: clearFlushPending
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS: now merge
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS:   index:
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10 _19(3.5):c10
> _18(3.5):c10
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: CMS:   no more
> merges pending; now return
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: waitForMerges
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: waitForMerges done
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now call final
> commit()
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: start
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: enter lock
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: already
> prepared
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit:
> pendingCommit != null
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: wrote
> segments file "segments_17"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: now checkpoint
> "segments_17" [43 segments ; isCommit = true]
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: deleteCommits: now
> decRef commit "segments_16"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "segments_16"
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: commit: done
> IW 52 [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: at close:
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _1a(3.5):C16 _17(3.5):c10 _19(3.5):c10
> _18(3.5):c10
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fdx"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.tis"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.frq"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.tii"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fnm"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.fdt"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_18.cfs"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.nrm"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_19.cfs"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_17.cfs"
> IFD [Thu Mar 15 15:25:27 MET 2012; pool-2-thread-1]: delete "_1a.prx"
>
> IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #3]: CMS:   merge
> thread: done
> IW 53 [Thu Mar 15 15:25:38 MET 2012; Lucene Merge Thread #0]: CMS: set
> priority of merge thread Lucene Merge Thread #3 to 6
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now flush at close
> waitForMerges=true
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now trigger flush
> reason=explicit flush
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]:   start flush:
> applyAllDeletes=true
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]:   index before flush
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
> _1a(3.5):c10 _19(3.5):c10
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: DW: flush: no docs;
> skipping
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: apply all deletes
> during flush
> BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: applyDeletes: no
> deletes; skipping
> BD 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: prune
> sis=org.apache.lucene.index.SegmentInfos@e91b11 minGen=0 packetCount=0
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: clearFlushPending
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS: now merge
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS:   index:
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
> _1a(3.5):c10 _19(3.5):c10
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: CMS:   no more
> merges pending; now return
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: waitForMerges
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: waitForMerges done
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now call final
> commit()
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: start
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: enter lock
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: already
> prepared
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit:
> pendingCommit != null
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: wrote
> segments file "segments_18"
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: now checkpoint
> "segments_18" [44 segments ; isCommit = true]
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: deleteCommits: now
> decRef commit "segments_17"
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "segments_17"
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: commit: done
> IW 53 [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: at close:
> _0(3.5):C73 _1(3.5):C70 _2(3.5):C23 _3(3.5):c7 _18(3.5):c10 _1b(3.5):c10
> _1a(3.5):c10 _19(3.5):c10
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_1b.cfs"
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_18.cfs"
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_19.cfs"
> IFD [Thu Mar 15 15:25:38 MET 2012; pool-2-thread-1]: delete "_1a.cfs"
>
> Kind regards,
> Bogguard
>
> On Thu, Mar 15, 2012 at 1:21 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Hmm, that's odd...
>>
>> Can you set IndexWriter's infoStream and then capture the output while
>> doing the small writes every few seconds and post back?
>>
>> If you run CheckIndex on the index does it also report ~3000 segments?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Mar 15, 2012 at 7:14 AM, Tim Bogaert <bogguard@gmail.com> wrote:
>> > Hi,
>> >
>> > We migrated our project to Lucene 3.5 and used the TieredMergePolicy with
>> > all its defaults.
>> > Our project contains a relatively small index (+-90MB) and gets small
>> > writes every few seconds.
>> > The behaviour we see is that when we reindex everything works great and
>> > performant,
>> > but when we keep the index running for some days we see a lot (+-3000) of
>> > very small .cfs files (between 1KB and 10KB) are being generated and the
>> > performance drops significantly.
>> > Reïndexing solves the problem, but this shouldn't be the solution.
>> >
>> > Is there a way to change the behaviour of the .cfs files. I noticed you
>> can
>> > fully disable it, but I read that this isn't adviced.
>> >
>> > Kind regards,
>> > Bogguard
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message