lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Why is the old value still in the index
Date Fri, 16 Dec 2011 21:58:48 GMT
On 16/12/2011 20:54, Paul Taylor wrote:
> On 16/12/2011 17:43, Uwe Schindler wrote:
>> Hi,
>>> I'm adding documents to an index, at a later date I modify a 
>>> document and
>>> update the index, close the writer and open a new IndexReader. My
>>> indexreader iterates over terms for that field and docFreq() returns 
>>> one
>> as I
>>> would expect, however the iterator  returns both the old value of the
>> document
>>> and the new value, I don't expect (or want) the old value to still 
>>> be in
>> the index,
>>> so why is this.
>> That is all as expected. Updating documents in a Lucene index is an 
>> atomic
>> delete/add operation. Deleting in Lucene just marks the document for
>> deletion, but it is still there (search results won't return it). The
>> condequence is that all terms are still in terms index and all document
>> frequencies still contain both documents. This *may* cause scoring 
>> problems
>> in indexes with many deletes (but those will go away as merging will 
>> remove
>> them, see below), but this is known (see wiki, javadocs,...).
>>
>> Once you add more documents the index will merge segments and that 
>> will make
>> the deleted documents disappear. If you really want to do remove the old
>> documents with all terms (this is  veeeeery expensive), you can call
>> IW.forceMergeDeletes:
>> http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/index/IndexWr 
>>
>> iter.html#forceMergeDeletes()
>>
>> The way how inverted indexes work makes it impossible to update the 
>> terms
>> index afterwards.
>>
>> Uwe
>>
>>
> Hi
>
> Thanks I think you might have it, but tell me if forceMergeDelete() is 
> a bad idea is there a query I can run that just returns all docs 
> rather than the iteration I use, (what I want is the value of a 
> particular field in each doc)
>
> Paul
Never mind Ive got it working by adding another field to the index with 
always the same value that I can search on

thansk Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message