lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <>
Subject [jira] Commented: (LUCENE-1750) Create a MergePolicy that limits the maximum size of it's segments
Date Mon, 20 Jul 2009 22:51:14 GMT


Jason Rutherglen commented on LUCENE-1750:

> Wouldn't you want them to be merged into an even larger

I think once the segment reaches the limit (i.e. 4GB), it's
effectively done and nothing more happens to it, except if it
accumulates too many deletes (as a percentage of docs) then it
can be compacted and new segments merged into it?

I think first of all, as we reach the capacity of the machine's
IO and RAM, large segment merges thrash the machine (i.e. the IO
cache is ruined and must be restored, IO is unavailable for
searches, further indexing stops), they become too large to pass
between servers (i.e. Hadoop, Katta, or Solr's replication). 

I'm not sure how much search degrades due to 10-20 larger
segments as opposed to a single massive 60GB segment? But if
search is unavailable on a machine due to the CPU and IO
thrashing (of massive segment merges) it seems like a fair

I think optimize remains as is although I would never call it.
Or we could add an optimize(long maxSegmentSize) method which is
analogous to optimize(int maxSegments). 

> Create a MergePolicy that limits the maximum size of it's segments
> ------------------------------------------------------------------
>                 Key: LUCENE-1750
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.1
>         Attachments: LUCENE-1750.patch
>   Original Estimate: 48h
>  Remaining Estimate: 48h
> Basically I'm trying to create largish 2-4GB shards using
> LogByteSizeMergePolicy, however I've found in the attached unit
> test segments that exceed maxMergeMB.
> The goal is for segments to be merged up to 2GB, then all
> merging to that segment stops, and then another 2GB segment is
> created. This helps when replicating in Solr where if a single
> optimized 60GB segment is created, the machine stops working due
> to IO and CPU starvation. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message