lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3454) rename optimize to a less cool-sounding name
Date Sun, 06 Nov 2011 15:52:51 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145024#comment-13145024
] 

Michael McCandless commented on LUCENE-3454:
--------------------------------------------


How about the name "forceMerge(int)" instead?

Fundamentally, this is a different operation from maybeMerge() because
that method only does "natural" merges, ie ones that the MP has
selected on its own.

Whereas forceMerge means you are forcing the MP to do merging that it
otherwise would not have naturally chosen to do.

I don't like names like compact/defragment since they still imply this
is a sort of necessary periodic maintenance that you are expected / need
to call.

The fact is, Lucene has made excellent progress on getting good
performance on multi-segment indexes: Query rewriting (eg MTQ) and
searching is per-segment.  TieredMP now targets segments with
deletions, and can merge out-of-order, etc.  Reducing the index down
to 1 segment is rarely justified given the cost (yes, there are times,
like a fully static index, but this is rare).

The goal here is to discourage "typical" users from calling
optimize ("expert" users will of course find the method and use it,
hopefully in the "right" cases).

The API is badly trappy today; we've seen this over and over now (I
just got a private email a few days ago... when I asked why they
optimize after every "batch" they said "because it just seemed like
the right thing to do").  We've all seen many users fall into this
trap.

We can try to debate why this is so... I don't think it's because they
are "morons".  I think there are many other explanations.  EG, our own
FAQs, javadocs, the Lucene in Action book, tutorials, etc., all
frequently "suggested" optimize in the past.  I think, also, users
often don't realize Lucene has "segments" and that optimize means
these segments are "fully rewritten" and that this then implies O(N^2)
cost if you call after every doc/batch, etc.  These things are obvious
to Lucene developers, but not so to users.

                
> rename optimize to a less cool-sounding name
> --------------------------------------------
>
>                 Key: LUCENE-3454
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3454
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.4, 4.0
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>         Attachments: LUCENE-3454.patch
>
>
> I think users see the name optimize and feel they must do this, because who wants a suboptimal
system? but this probably just results in wasted time and resources.
> maybe rename to collapseSegments or something?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message