lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3141) Deprecate OPTIMIZE command in Solr
Date Sun, 19 Feb 2012 16:10:37 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211394#comment-13211394
] 

Uwe Schindler commented on SOLR-3141:
-------------------------------------

I just repeat here, what Mike already posted on the Lucene issue:

{quote}
Some quick googling uncovers depressing examples of over-optimizing:

* https://jira.duraspace.org/browse/FCREPO-155
* http://stackoverflow.com/questions/3912253/is-it-mandatory-to-optimize-the-lucene-index-after-write
* http://issues.liferay.com/browse/LPS-2944
* http://download.oracle.com/docs/cd/E19316-01/820-7054/girqf/index.html
* https://issues.sonatype.org/browse/MNGECLIPSE-2359
* http://blog.inflinx.com/tag/lucene

That last one has this fun comment:

{code:java}
// Lucene recommends calling optimize upon completion of indexing writer.optimize();
{code}
{quote}

Most of the above items also affect Solr. E.g. the first one (I know people from FIZ Karlsruhe
and Fedora) is really funny. Fedora GSearch calls optimze=true on every add of a single document
to Solr. I even know people using Solr and complained about GSearch because of this.

We can fix those horrible user-code bugs very fast by making optimize a no-op in Solr, they
all will appreciate that. I just repeat: Nobody's installation would break, it would just
get faster.

Some funny detail: With Lucene 3.x, search actuall gets faster with multiple segments if you
do parallel ExceutorService-based search (I still dont really recommend to use ExceutorService
on IndexSearcher...). On the other hand by executing the search on a non-optimized pre 2.9
index with no per segment search was really slower, as MultiTermsEnum and MultiDocsEnum was
used.

With Lucene 3.x there is really no slowdown at all caused by multiple segments, as each segment
is searched on its own with no interaction and just the results added to same priority queue.
I agree, Solr has some problems with facetting, but people should use per-segment facetting
and not optimize, this would improve their installations immense (although the actual facetting
might get slower, but on the other hand FieldCaches can be reused, so it actually gets faster).
The current default is global facetting and (for most installations) "optimize on *every*
single item added" (see above links).
                
> Deprecate OPTIMIZE command in Solr
> ----------------------------------
>
>                 Key: SOLR-3141
>                 URL: https://issues.apache.org/jira/browse/SOLR-3141
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 3.5
>            Reporter: Jan H√łydahl
>              Labels: force, optimize
>             Fix For: 3.6
>
>
> Background: LUCENE-3454 renames optimize() as forceMerge(). Please read that issue first.
> Now that optimize() is rarely necessary anymore, and renamed in Lucene APIs, what should
be done with Solr's ancient optimize command?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message