lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: possible segment merge improvement?
Date Thu, 01 Nov 2007 06:06:07 GMT
It seems that the following are needed:

FieldInfos.hashCode(); // to allow for fast equals failure
FieldInfos.equals();

for most efficient buffer reuse during merge to avoid GC, add

int FieldsReader.doclength(int doc);
int FieldsReader.binarydoc(int doc,byte[] buffer);

this will allow the caller to reuse the existing buffer if large  
enough, or create a new one

and

FieldsWriter.addBinaryDocument(byte[] buffer,int len);

All of the above methods are trivial.

SegmentMerger just needs to be changed to compare the readers to be  
merged, and if all have equal FieldInfos, then use a short circuit  
copy similar to

byte[] buffer = new byte[1024];

for each reader {
     for doc in reader {
	    if doc not deleted {
            	int len = reader.doclength(doc);
                 if(len > buffer.length) {
                         buffer = new byte[len*2]; // allow for growth
		}
                 reader.binarydoc(doc,buffer);
                 newsegment.addBinaryDocument(buffer,len);
           }
     }
}



On Nov 1, 2007, at 12:30 AM, jian chen wrote:

> Hi, Robert,
>
> That's a brilliant idea! Thanks so much for suggesting that.
>
> Cheers,
>
> Jian
>
> On 10/31/07, robert engels <rengels@ix.netcom.com> wrote:
>>
>> Currently, when merging segments, every document is [parsed and then
>> rewritten since the field numbers may differ between the segments
>> (compressed data is not uncompressed in the latest versions).
>>
>> It would seem that in many (if not most) Lucene uses the fields
>> stored within each document with an index are relatively static,
>> probably changing for all documents added after point X, if at all.
>>
>> Why not check the fields dictionary for the segments being merged,
>> and if the same, just copy the binary data directly?
>>
>> In the common case this should be a vast improvement.
>>
>> Anyone worked on anything like this? Am I missing something?
>>
>> Robert Engels
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message