lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Deferring merging of index segments
Date Mon, 04 Jun 2012 20:48:08 GMT
Awesome, thanks for bringing closure Vitaly.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jun 4, 2012 at 3:10 PM, Vitaly Funstein <vfunstein@gmail.com> wrote:
> Thanks for the tip, Mike. After changing the three calls
>
> IndexWriter.commit();
>
> <revert merge policy to allow merging to happen>
>
> IndexWriter.maybeMerge();
> IndexWriter.waitForMerges();
>
> to simply calling IndexWriter.close(true) the disk size and run time are
> now very close to the case of parallel segment merges.
>
> On Sat, Jun 2, 2012 at 6:43 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> On Fri, Jun 1, 2012 at 8:09 PM, Vitaly Funstein <vfunstein@gmail.com>
>> wrote:
>> > Yes, I am only calling IndexWriter.addDocument()
>>
>> OK.
>>
>> > Interestingly, relative performance of either approach seems to greatly
>> > depend on the number of documents per index. In both types of runs, I
>> used
>> > 10 writer threads, each writing documents with the same set of fields
>> (but
>> > random values), into its own index as fast as possible, on a 16 core box,
>> > using a rotational disk for index storage (results from my original post
>> > were obtained from a Fusion IO drive, and an even higher # of cores per
>> > machine).
>>
>> Mmmmm Fusion IO drive :)
>>
>> > For smaller index sizes, the choice of whether to merge segments
>> > in parallel makes much less of a difference, if at all.
>> >
>> > So the matrix looks like this:
>> >
>> > # docs/index     concurrent merges?      total time, sec    total disk
>> size
>> >
>> ===========================================================================
>> > 200K             Y                       56.8          
    1.5 G
>> > 200K             N                       59.6          
    2.6 G
>> > 1M               Y                       304          
     7.4 G
>> > 1M               N                       493          
     14  G
>> >
>> > As you can see, the total size on disk is always much larger when merging
>> > at the end; here are directory listings, for each case:
>>
>> OK so for a biggish index merging concurrently is faster; this is what
>> I'd expect.
>>
>> > Concurrent merging:
>> >
>> > total 150M
>> > -rw-r--r-- 1 bench perf    0 2012-06-01 16:33 write.lock
>> > -rw-r--r-- 1 bench perf   87 2012-06-01 16:33 _a.fnm
>> > -rw-r--r-- 1 bench perf  17M 2012-06-01 16:33 _a.tis
>> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _a.tii
>> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _a.prx
>> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _a.frq
>> > -rw-r--r-- 1 bench perf   87 2012-06-01 16:33 _l.fnm
>> > -rw-r--r-- 1 bench perf  17M 2012-06-01 16:33 _l.tis
>> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _l.tii
>> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _l.prx
>> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _l.frq
>> > -rw-r--r-- 1 bench perf   87 2012-06-01 16:33 _w.fnm
>> > -rw-r--r-- 1 bench perf  17M 2012-06-01 16:33 _w.tis
>> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _w.tii
>> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _w.prx
>> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _w.frq
>> > -rw-r--r-- 1 bench perf   87 2012-06-01 16:33 _17.fnm
>> > -rw-r--r-- 1 bench perf  17M 2012-06-01 16:33 _17.tis
>> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _17.tii
>> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _17.prx
>> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _17.frq
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1j.cfs
>> > -rw-r--r-- 1 bench perf   87 2012-06-01 16:33 _1i.fnm
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1k.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1m.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1l.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1n.cfs
>> > -rw-r--r-- 1 bench perf  17M 2012-06-01 16:33 _1i.tis
>> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _1i.tii
>> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _1i.prx
>> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _1i.frq
>> > -rw-r--r-- 1 bench perf 148K 2012-06-01 16:33 _1p.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1o.cfs
>> > -rw-r--r-- 1 bench perf  28M 2012-06-01 16:33 _0.cfx
>> > -rw-r--r-- 1 bench perf 2.8K 2012-06-01 16:33 segments_2
>> > -rw-r--r-- 1 bench perf   20 2012-06-01 16:33 segments.gen
>> >
>> > Deferred merging:
>> >
>> > total 261M
>> > -rw-r--r-- 1 bench perf    0 2012-06-01 16:41 write.lock
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _0.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _3.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _2.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _4.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _6.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _5.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _7.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _9.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _8.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _a.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _c.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _b.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _d.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _f.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _e.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _g.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _i.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _h.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _j.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _l.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _k.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _m.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _n.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _p.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _o.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _q.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _s.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _r.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _t.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _v.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _u.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _w.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _x.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _z.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _y.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _11.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _10.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _13.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _12.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _16.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _15.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _14.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _18.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _17.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1b.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1a.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _19.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1d.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1c.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1g.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1f.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1e.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1j.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1i.cfs
>> > -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1h.cfs
>> > -rw-r--r-- 1 bench perf  28M 2012-06-01 16:41 _0.cfx
>> > -rw-r--r-- 1 bench perf 137K 2012-06-01 16:42 _1k.cfs
>> > -rw-r--r-- 1 bench perf  12K 2012-06-01 16:42 segments_2
>> > -rw-r--r-- 1 bench perf   20 2012-06-01 16:42 segments.gen
>> > -rw-r--r-- 1 bench perf   87 2012-06-01 16:42 _1l.fnm
>> > -rw-r--r-- 1 bench perf   87 2012-06-01 16:42 _1n.fnm
>> > -rw-r--r-- 1 bench perf  17M 2012-06-01 16:42 _1l.tis
>> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1l.tii
>> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1l.prx
>> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1l.frq
>> > -rw-r--r-- 1 bench perf   87 2012-06-01 16:42 _1o.fnm
>> > -rw-r--r-- 1 bench perf  17M 2012-06-01 16:42 _1n.tis
>> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1n.tii
>> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1n.prx
>> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1n.frq
>> > -rw-r--r-- 1 bench perf   87 2012-06-01 16:42 _1p.fnm
>> > -rw-r--r-- 1 bench perf  17M 2012-06-01 16:42 _1o.tis
>> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1o.tii
>> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1o.prx
>> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1o.frq
>> > -rw-r--r-- 1 bench perf  17M 2012-06-01 16:42 _1p.tis
>> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1p.tii
>> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1p.prx
>> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1p.frq
>> > -rw-r--r-- 1 bench perf   87 2012-06-01 16:42 _1m.fnm
>> > -rw-r--r-- 1 bench perf  17M 2012-06-01 16:42 _1m.tis
>> > -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1m.tii
>> > -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1m.prx
>> > -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1m.frq
>>
>> Hmm: you should close the writer (or do a final commit) before testing
>> the size of the index.  I suspect in the 2nd case because no final
>> commit happened, the original segments are still around.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message