lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance
Date Mon, 01 Oct 2007 20:31:51 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531623
] 

Hoss Man commented on LUCENE-994:
---------------------------------

lucene's "commitment" to backward compatibility requires that 2.X be API compatible with all
previous 2.Y releases...

http://wiki.apache.org/jakarta-lucene/BackwardsCompatibility

...the performance characteristics and file formats (and merge behavior) doesn't need to be
exactly the same (so switching the Merge Policy should be fine) as long as a note is made
about this in the "run time behavior" section of CHANGES.txt (we already have one) ... but
any code that could compile and run against 2.2 should still run against 2.3 ... it might
be really slow without some minor tweaks, and it might violate "principle of least surprise"
but it's important that it still run so that people don't have to fear a minor version upgrade.

if we can't provide that -- then the release should be called 3.0, but i haven't seen anything
that least  me to think it's not possible.  (it just might not be pretty)

> Change defaults in IndexWriter to maximize "out of the box" performance
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-994
>                 URL: https://issues.apache.org/jira/browse/LUCENE-994
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-994.patch, writerinfo.zip
>
>
> This is follow-through from LUCENE-845, LUCENE-847 and LUCENE-870;
> I'll commit this once those three are committed.
> Out of the box performance of IndexWriter is maximized when flushing
> by RAM instead of a fixed document count (the default today) because
> documents can vary greatly in size.
> Likewise, merging performance should be faster when merging by net
> segment size since, to minimize the net IO cost of merging segments
> over time, you want to merge segments of equal byte size.
> Finally, ConcurrentMergeScheduler improves indexing speed
> substantially (25% in a simple initial test in LUCENE-870) because it
> runs the merges in the backround and doesn't block
> add/update/deleteDocument calls.  Most machines have concurrency
> between CPU and IO and so it makes sense to default to this
> MergeScheduler.
> Note that these changes will break users of ParallelReader because the
> parallel indices will no longer have matching docIDs.  Such users need
> to switch IndexWriter back to flushing by doc count, and switch the
> MergePolicy back to LogDocMergePolicy.  It's likely also necessary to
> switch the MergeScheduler back to SerialMergeScheduler to ensure
> deterministic docID assignment.
> I think the combination of these three default changes, plus other
> performance improvements for indexing (LUCENE-966, LUCENE-843,
> LUCENE-963, LUCENE-969, LUCENE-871, etc.) should make for some sizable
> performance gains Lucene 2.3!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message