lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: MergePolicy Thresholds
Date Mon, 02 May 2011 13:56:07 GMT
I did look at it, but I didn't find that it answers this particular need
(ending with a segment no bigger than X). Perhaps by tweaking several
parameters (e.g. maxLarge/SmallNumSegments + maxMergeSizeMB) I can achieve
something, but it's not very clear what is the right combination.

Which is related to one of the points -- is it not more intuitive for an app
to set this threshold (if it needs any thresholds), than tweaking all of
those parameters? If so, then we only need two thresholds (size +
mergeFactor), and we can reuse BalancedMP's findBalancedMerges logic
(perhaps w/ some adaptations) to derive a merge plan.

Shai

On Mon, May 2, 2011 at 4:42 PM, Earwin Burrfoot <earwin@gmail.com> wrote:

> Have you checked BalancedSegmentMergePolicy? It has some more knobs :)
>
> On Mon, May 2, 2011 at 17:03, Shai Erera <serera@gmail.com> wrote:
> > Hi
> >
> > Today, LogMP allows you to set different thresholds for segments sizes,
> > thereby allowing you to control the largest segment that will be
> > considered for merge + the largest segment your index will hold (=~
> > threshold * mergeFactor).
> >
> > So, if you want to end up w/ say 20GB segments, you can set
> > maxMergeMB(ForOptimize) to 2GB and mergeFactor=10.
> >
> > However, this often does not achieve your desired goal -- if the index
> > contains 5 and 7 GB segments, they will never be merged b/c they are
> > bigger than the threshold. I am willing to spend the CPU and IO resources
> > to end up w/ 20 GB segments, whether I'm merging 10 segments together or
> > only 2. After I reach a 20GB segment, it can rest peacefully, at least
> > until I increase the threshold.
> >
> > So I wonder, first, if this threshold (i.e., largest segment size you
> > would like to end up with) is more natural to set than thee current
> > thresholds,
> > from the application level? I.e., wouldn't it be a simpler threshold to
> set
> > instead of doing weird calculus that depend on maxMergeMB(ForOptimize)
> > and mergeFactor?
> >
> > Second, should this be an addition to LogMP, or a different
> > type of MP. One that adheres to only those two factors (perhaps the
> > segSize threshold should be allowed to set differently for optimize and
> > regular merges). It can pick segments for merge such that it maximizes
> > the result segment size (i.e., don't necessarily merge in sequential
> > order), but not more than mergeFactor.
> >
> > I guess, if we think that maxResultSegmentSizeMB is more intuitive than
> > the current thresholds, application-wise, then this change should go
> > into LogMP. Otherwise, it feels like a different MP is needed, because
> > LogMP is already complicated and another threshold would confuse things.
> >
> > What do you think of this? Am I trying to optimize too much? :)
> >
> > Shai
> >
> >
>
>
>
> --
> Kirill Zakharenko/Кирилл Захаренко
> E-Mail/Jabber: earwin@gmail.com
> Phone: +7 (495) 683-567-4
> ICQ: 104465785
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message