lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-982) Create new method optimize(int maxNumSegments) in IndexWriter
Date Wed, 21 Nov 2007 20:48:43 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544591
] 

Michael McCandless commented on LUCENE-982:
-------------------------------------------


{quote}
I looked at your patch and I'm wondering if it wouldn't make more
sense to limit the overall size of the segments (MB and/or num docs)
involved in a merge rather than the number of segments?
{quote}

Thanks for reviewing :)

I think that's a good idea!

But I'm torn on which is "better" as a first step.

If we limit by size, then the benefit is, even as your index grows
very large, the cost of optimizing is constant once you hit the max
segment size.  You keep your optimize cost down.

But, then, your searches will get slower and slower as your index
grows since these large segments never get merged (actually you'd have
to set maxMergeDocs as well so normal merging wouldn't merge them).

But limiting by segment count, I think you keep you search costs
lower, at the expense of higher and higher optimize costs as your
index gets larger.

I think people optimize because they want to pay a high cost, once,
now, in order to have fast[er] searches.  So by limiting segment count
during optimizing, we still leave the increasing cost (as your index
grows) on the optimize() call.

I think we should eventually do both?

The good news is with the ability to customize MergePolicy, anyone can
customize what it means to "optimize" an index just by implementing
their own MergePolicy.


> Create new method optimize(int maxNumSegments) in IndexWriter
> -------------------------------------------------------------
>
>                 Key: LUCENE-982
>                 URL: https://issues.apache.org/jira/browse/LUCENE-982
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-982.patch
>
>
> Spinning this out from the discussion in LUCENE-847.
> I think having a way to "slightly optimize" your index would be useful
> for many applications.
> The current optimize() call is very expensive for large indices
> because it always optimizes fully down to 1 segment.  If we add a new
> method which instead is allowed to stop optimizing once it has <=
> maxNumSegments segments in the index, this would allow applications to
> eg optimize down to say <= 10 segments after doing a bunch of updates.
> This should be a nice compromise of gaining good speedups of searching
> while not spending the full (and typically very high) cost of
> optimizing down to a single segment.
> Since LUCENE-847 is now formalizing an API for decoupling merge policy
> from IndexWriter, if we want to add this new optimize method we need
> to take it into account in LUCENE-847.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message