lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: ConcurrentMergeScheduler and MergePolicy question
Date Mon, 03 Aug 2009 19:59:36 GMT
Michael McCandless wrote:
> On the impact of search performance for large vs small mergeFactors, I
> think the jury is still out.  People should keep testing that (and
> report back!).  Certainly, for the fastest reopen time you never want
> any merging to be done :)
>   
Here is the original exchange I referenced:

 >>On Fri, Apr 10, 2009 at 3:06 PM, Mark Miller <markrmiller@gmail.com> 
wrote:
 >>    24 segments is bound to be quite a bit slower than an optimized 
index for most things

 >I'd be curious just how true this really is (in general)... my guess
 >is the "long tail of tiny segments" gets into the OS's IO cache (as
 >long as the system stays hot) and doesn't actually hurt things much.
 >
 >Has anyone tested this (performance of unoptimized vs optimized
 >indexes, in general) recently?  To be a fair comparison, there should
 >be no deletions in the index.
 >
 >Mike

After reading that, I played with some sorting code I had and did a 
quick cheesy test or two - one segment vs a 10 or 20. In that horrible 
test (based on the stress sort code), I don't remember seeing much of a 
difference. No sorting. Very, very unscientific, quick and dirty.

This time I loaded up 1.3 million wikipedia articles, gave the test 
768MB of RAM, warmed the Searcher with lots of searching before each 
measurement, and compared 1 segment vs 5. The optimized index was 15-20% 
faster with the queries I was using (approx 100 queries targeted at 
wikipedia). Its an odd test system - Ubuntu, Quad core laptop with slow 
laptop drives and 4 gig of RAM. Still not very scientific, but better 
than before.


Here is the benchmark I was using in various forms:

{ "Rounds"

    ResetSystemErase

    { "Populate"
        -CreateIndex
        { "MAddDocs" AddDoc > : 15000
        -CloseIndex
    }
    { "test"
        OpenReader 
        { "WarmRdrDocs" Warm > : 50
        { "WarmRdr" Search > : 5000
        { "SearchSameRdr" Search > : 50000
        CloseReader
                       
        OpenIndex
        PrintSegmentCount
        Optimize   
        CloseIndex         
        NewRound
    } : 2
 }

RepSumByName
RepSumByPrefRound SearchSameRdr


I also did a quick profile for a 15k index, 1seg vs 10 segs. I profiled 
each for approx 11 million calls of readVint. The hotspot results are below.

http://myhardshadow.com/images/1seg.png
http://myhardshadow.com/images/10seg.png


Just a quick start at looking into this from over the weekend.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message