lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: How does delete work?
Date Fri, 22 Nov 2002 21:16:47 GMT
No, in my example optimize() was never called.  The merge rule operates 
recursively.  So, after 99 documents had been added the segment stack 
contained nine indexes with ten documents and nine with one document. 
When the hundredth document was added, the nine one document segments 
were popped of the stack and merged into a single segment that was 
pushed onto the stack.  So then the top of the stack had ten segments 
each containing ten documents, i.e., mergeFactor segments of the same 
size, and these ten segments were then merged into a single segment of 
100 documents.  So adding the 100th document triggered two merges.

(One error in my previous example: the 100 document segment actually 
contains documents 0-99, not 0-100.)

A corollary of this is, when mergeFactor is 10 and no deletions have 
been made, the segments correspond to the digits in the decimal 
representation of the number of documents in the index.  So, an 85 
document index has eight segments with 10 documents and five segments 
with one documents.  (This is somewhat of a simplification, as Lucene 
automatically merges single document segments before ever writing them 
to disk as an optimization.)

It is most beneficial to call IndexWriter.optimize() only when you know 
you won't be adding documents to an index for a while, but will be 
searching it a lot.  Calling optimize() periodically while indexing 
mostly just slows things down.

Doug

Otis Gospodnetic wrote:
> I see, so every mergeFactor documents they are compined into a single
> new segment in the index, and only when optimize() is called do those
> multiple segments get merged into a single segment.
> In your example below that would mean that optimize() was called after
> document 100 was added, hence a single segment with documents 0-100.
> Is this right?
> 
> Thanks,
> Otis
> 
> --- Doug Cutting <cutting@lucene.com> wrote:
> 
>>Merging happens constantly as documents are added.  Each document is 
>>initially added in its own segment, and pushed onto the segment
>>stack. 
>>Whenever there are mergeFactor segments on the top of the stack that
>>are 
>>the same size, these are merged together into a new single segment
>>that 
>>replaces them.  So, if mergeFactor is 10, and you've added 122 
>>documents, the stack will have five segments, as follows:
>>   document 121
>>   document 120
>>   documents 110-119
>>   documents 100-109
>>   documents 0-100
>>The next merge will happen after document 129 is added, when a new 
>>segment will replace the segments for document 120 through document
>>129 
>>with a new single segment.
>>
>>It's actually a little more complicated than that, since (among other
>>
>>reasons) after docuuments are deleted a segment's size will no longer
>>be 
>>exactly a power of the mergeFactor.
>>
>>Doug
>>
>>Otis Gospodnetic wrote:
>>
>>>This is via mergeFactor?
>>>
>>>--- Doug Cutting <cutting@lucene.com> wrote:
>>>
>>>
>>>>The data is actually removed the next time its segment is merged. 
>>>>Optimizing forces it to happen, but it will also eventually happen
>>>
>>as
>>
>>>>more documents are added to the index, without optimization.
>>>>
>>>>Scott Ganyo wrote:
>>>>
>>>>
>>>>>It just marks the record as deleted.  The record isn't actually
>>>>
>>>>removed 
>>>>
>>>>
>>>>>until the index is optimized.
>>>>>
>>>>>Scott
>>>>>
>>>>>Rob Outar wrote:
>>>>>
>>>>>
>>>>>
>>>>>>Hello all,
>>>>>>
>>>>>>   I used the delete(Term) method, then I looked at the index
>>>>>
>>>>files, 
>>>>
>>>>
>>>>>>only one
>>>>>>file changed "_1tx.del"  I found references to the file still in
>>>>>
>>>>some 
>>>>
>>>>
>>>>>>of the
>>>>>>index files, so my question is how does Lucene handle deletes?
>>>>>>
>>>>>>Thanks,
>>>>>>
>>>>>>Rob
>>>>>>
>>>>>>
>>>>>>-- 
>>>>>>To unsubscribe, e-mail:
>>>>>>For additional commands, e-mail: 
>>>>>
>>>>>
>>>>>
>>>>--
>>>>To unsubscribe, e-mail:  
>>>><mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>>>For additional commands, e-mail:
>>>><mailto:lucene-user-help@jakarta.apache.org>
>>>>
>>>
>>>__________________________________________________
>>>Do you Yahoo!?
>>>Yahoo! Mail Plus – Powerful. Affordable. Sign up now.
>>>http://mailplus.yahoo.com
>>>
>>>--
>>>To unsubscribe, e-mail:  
>>
>><mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>
>>>For additional commands, e-mail:
>>
>><mailto:lucene-user-help@jakarta.apache.org>
>>
>>
>>--
>>To unsubscribe, e-mail:  
>><mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>For additional commands, e-mail:
>><mailto:lucene-user-help@jakarta.apache.org>
>>
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus – Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
> 


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message