lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Parkes (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter
Date Thu, 19 Apr 2007 22:42:16 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12490192
] 

Steven Parkes commented on LUCENE-847:
--------------------------------------

Here are some numbers comparing the load performance for the factored vs. non-factored merge
policies.

The setup uses enwiki, loads 200K documents, and uses 4 different combinations of maxBufferedDocs
and mergeFactor (just the default from the standard benchmark, not because I necessarily thought
it was a good idea.)

The factored merge policy seems to be on the order of 1% slower loading than the non-factored
version ... and I'm not sure why, so I'm going to check into this. The factored version does
more examination of segment list than the non-factored version, so there's compute overhead,
but I would expect that to be swamped by I/O Maybe that's not a good assumption? Or it might
be doing different merges for reasons I haven't considered, which I'm going to check.

Relating this to some of the merge discussions, I'm going to look at monitoring both the number
of merges taking place and the size of those merges. I think that's helpful in understand
different candidate merge policies, in addition to absolute difference in runtime.

I also think histogramming  the per-doc cost would also be interesting, since mitigating the
long delay at cascading merges is at least one goal of a concurrent merge policy.

And all this doesn't even consider testing the recent stuff on merging multiple indexes. That's
an area where the factored merge policy differs (because of the simpler interface.)

I'm curious if anyone is surprised by these numbers, the 60 docs/sec, in particular. This
machine is a dual dual-core xeon, writing to a single local disk.  My dual opty achieved ~85-100
docs/sec on a three disk SATA3 RAID5 array.

Non-factored (current) merge policy

     [java] ------------> Report sum by Prefix (MAddDocs) and Round (8 about 8 out of 33)
     [java] Operation       round mrg buf   runCnt   recsPerRun        rec/s  elapsedSec 
  avgUsedMem    avgTotalMem
     [java] MAddDocs_200000     0  10  10        1       200000         41.6    4,804.11 
  11,758,928     12,591,104
     [java] MAddDocs_200000 -   1 100  10 -  -   1 -  -  200000 -  -  - 50.0 -  4,000.25 -
 34,831,992 -   52,563,968
     [java] MAddDocs_200000     2  10 100        1       200000         49.9    4,004.95 
  42,158,232     60,444,672
     [java] MAddDocs_200000 -   3 100 100 -  -   1 -  -  200000 -  -  - 57.9 -  3,455.97 -
 45,646,680 -   61,083,648
     [java] MAddDocs_200000     4  10  10        1       200000         44.9    4,458.66 
  36,928,616     61,083,648
     [java] MAddDocs_200000 -   5 100  10 -  -   1 -  -  200000 -  -  - 50.4 -  3,965.98 -
 47,855,064 -   61,083,648
     [java] MAddDocs_200000     6  10 100        1       200000         49.7    4,023.51 
  52,506,448     64,217,088
     [java] MAddDocs_200000 -   7 100 100 -  -   1 -  -  200000 -  -  - 57.9 -  3,451.39 -
 64,466,128 -   73,220,096

Factored (new) merge policy

     [java] ------------> Report sum by Prefix (MAddDocs) and Round (8 about 8 out of 33)
     [java] Operation       round mrg buf   runCnt   recsPerRun        rec/s  elapsedSec 
  avgUsedMem    avgTotalMem
     [java] MAddDocs_200000     0  10  10        1       200000         41.4    4,828.25 
  10,477,976     12,386,304
     [java] MAddDocs_200000 -   1 100  10 -  -   1 -  -  200000 -  -  - 50.4 -  3,968.27 -
 38,333,544 -   46,170,112
     [java] MAddDocs_200000     2  10 100        1       200000         50.3    3,973.52 
  33,539,824     63,860,736
     [java] MAddDocs_200000 -   3 100 100 -  -   1 -  -  200000 -  -  - 58.6 -  3,413.87 -
 44,580,528 -   87,781,376
     [java] MAddDocs_200000     4  10  10        1       200000         45.3    4,411.50 
  57,850,104     87,781,376
     [java] MAddDocs_200000 -   5 100  10 -  -   1 -  -  200000 -  -  - 51.0 -  3,921.48 -
 62,793,432 -   87,781,376
     [java] MAddDocs_200000     6  10 100        1       200000         50.4    3,969.87 
  49,625,496     93,966,336
     [java] MAddDocs_200000 -   7 100 100 -  -   1 -  -  200000 -  -  - 58.7 -  3,409.51 -
 68,100,288 -  129,572,864


> Factor merge policy out of IndexWriter
> --------------------------------------
>
>                 Key: LUCENE-847
>                 URL: https://issues.apache.org/jira/browse/LUCENE-847
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Steven Parkes
>         Assigned To: Steven Parkes
>         Attachments: concurrentMerge.patch, LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, making it
possible for apps to choose a custom merge policy and for easier experimenting with merge
policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message