lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] Commented: (LUCENE-1750) Create a MergePolicy that limits the maximum size of it's segments
Date Sun, 19 Jul 2009 05:50:14 GMT


Shai Erera commented on LUCENE-1750:

What happens after several such large segments are created? Wouldn't you want them to be merged
into an even larger segment? Or, you'll have many such segments and search performance will

I guess I never thought this is a problem. If I have enough disk space, and my index size
reaches 600 GB (which is a huge index), and is split across 10 different segments of size
60GB each, I guess I'd want them to be merged into one larger 600GB segment. It will take
ions until I'll accumulate another such 600 GB segment, no?

Maybe we can have two merge factors: 1) for small segments, or up to a set size threshold,
where we do the merges regularly. 2) Then, for really large segments we say the marge factor
is different. For example, we can say that up to 1GB the merge factor is 10, and beyond the
merge factor is 20. That will postpone the large IO merges until enough such segments accumulate.

Also, w/ the current proposal, how will optimize work? Will it skip the very large segments,
or will they be included too?

> Create a MergePolicy that limits the maximum size of it's segments
> ------------------------------------------------------------------
>                 Key: LUCENE-1750
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.1
>         Attachments: LUCENE-1750.patch
>   Original Estimate: 48h
>  Remaining Estimate: 48h
> Basically I'm trying to create largish 2-4GB shards using
> LogByteSizeMergePolicy, however I've found in the attached unit
> test segments that exceed maxMergeMB.
> The goal is for segments to be merged up to 2GB, then all
> merging to that segment stops, and then another 2GB segment is
> created. This helps when replicating in Solr where if a single
> optimized 60GB segment is created, the machine stops working due
> to IO and CPU starvation. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message