lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <>
Subject Re: possible segment merge improvement?
Date Fri, 02 Nov 2007 18:50:38 GMT
Sort of (if I understand you).

Eventually the segments (after merging) converge to having the same  
fields in the same order.

New segments are mostly merged only with new segment (which probably  
have the same fields).

When a "newer" segment is merged with a "older" you will not be able  
to optimize the process (some complex change/mapping code might be  
able to do a better job that the current brute force read all / write  
all method).

If the fields were always kept sorted you have a better chance of  
having the fields dictionary of various segments match up.

At least for us, our fields dictionaries are VERY static, and  
constant across all documents (we partition different document types  
into separate indexes), so this optimization is a big help.

On Nov 2, 2007, at 1:40 PM, Yonik Seeley wrote:

> On 11/1/07, robert engels <> wrote:
>> I have looked into modifying FieldInfos to keep the fields sorted by
>> field name, so the user would not be forced to add the fields in the
>> same order.
>> Sparse documents are really not a problem. Since after the first
>> merge of that document it will pickup the other fields from the other
>> segments, after which it will merge "as the same".
> Only when the field numbers happen match up though right?
> There could be number mismatches far after the first merge, depending
> on what fields were encountered first in those segments.
> Aside: renumbering fields is another area where using byte counts
> instead of char counts should really speed things up.
> -Yonik
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message