lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wettin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-879) Document number integrity merge policy
Date Fri, 11 May 2007 20:03:15 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495145
] 

Karl Wettin commented on LUCENE-879:
------------------------------------

Doron, thanks for the input. 

I have not had time to read and think everything though that you wrote yet, but I will tell
you of what I'm doing and what I'm aiming at.

I use this patch in conjunction with an Oracle (Sleepycat) BDB object storage. The Lucene
document number (LDN) is used as secondary key. I do no unmarshalling to object from data
stored in Lucene fields, I only use it as an index. I never have to read the document from
Lucene. I have no clue how much CPU ticks or bits of RAM this might save me, I'll have to
bench that later on. This is just me fooling around with technology solutions for fun, a proof
of concept. There is no real project.

When I update an instance of the object storage, I'll create a new document in Lucene and
then update the LDN in the instace to be updated in the object storage, then delete the old
document in Lucene.

Even though it works, I do not like this solution. I want to fully retain the document number
integrity for updated document. I belive this can be solved if i limit the warranty to an
index in an optimized state. 

An instance of DocumentIdentityFactory, capable of identifying and create queried to uniquely
identify documents, will be passed to the SegmentMerger. It might look at field "_type" and
"_pk", or so. 

As SegmentMerger.mergeFields reach a deleted document it will use the factory to find replacements
for the deleted document in the index. The one with the top document number is latest one
and thus the winner. This document will be added at the current position and added to a list
of document number to treat as deleted. 

Ta-da, and there we have safe(tm) document numbers.


> Document number integrity merge policy
> --------------------------------------
>
>                 Key: LUCENE-879
>                 URL: https://issues.apache.org/jira/browse/LUCENE-879
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 2.1
>            Reporter: Karl Wettin
>            Priority: Minor
>         Attachments: LUNCENE-879.diff
>
>
> This patch allows for document numbers stays the same even after merge of segments with
deletions.
> Consumer needs to do this:
> indexWriter.setSkipMergingDeletedDocuments(false);
> The effect will be that deleted documents are replaced by a new Document() in the merged
segment, but not marked as deleted. This should probably be some policy thingy that allows
for different solutions such as keeping the old document, et c.
> Also see http://www.nabble.com/optimization-behaviour-tf3723327.html#a10418880

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message