incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-user] Replacing a document in an index
Date Sat, 22 Oct 2011 11:34:11 GMT
On Fri, Oct 21, 2011 at 08:21:20PM +0200, goran kent wrote:
> Can I delete doc/add doc (in effect replacing) in one go?:
> 
> my $index = Lucy::Index::Indexer->new(...create=>0...);
> # delete existing:
> $index->delete_by_term(field=>'docid', term=>$docid);
> # add new:
> $index->add_doc($doc);
> $index->commit;

Yes that works.

> The man page for Lucy::Index::Indexer(3) states:
> 
> "Note: at present, delete_by_term() and delete_by_query() only affect
> documents which had been previously committed to the index -- and not
> any documents added this indexing session but not yet committed.  This
> may change in a future update."
> 
> I understand this as meaning I cannot add $docid, then try and delete
> the same $docid in the same session (ie, between new() and commit()).
> Since I'm deleting a previously committed document, and re-adding it,
> I should be ok...
> 
> Is my understanding correct?
 
Yes.

> Finally, I read in the docs somewhere that delete_by_term() will only
> flag a doc as deleted (so it's ignored during searches), but since
> commit() is being called, will the deleted doc be physically removed
> as well?

Maybe, maybe not.

Indexes are made up of segments, with each segment containing one or more
documents.  Segments are recycled periodically; when that happens, valid
documents are rewritten to a new segment, but deleted documents are simply
discarded and at that point we could say that they are "physically removed".

By default one or more segments will get recycled either when too many of them
accumulate or when one exceeds a 10% deletion threshold.

Marvin Humphrey


Mime
View raw message