lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet
Date Fri, 05 Dec 2008 18:44:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653889#action_12653889
] 

Michael McCandless commented on LUCENE-1476:
--------------------------------------------


bq. It seemed wrong to pay the method call overhead for IndexReader.isDeleted() on each iter
in NOTScorer.next() or MatchAllScorer.next(), when we could just store the next deletion:

Nice!  This is what I had in mind.

I think we could [almost] do this across the board for Lucene.
SegmentTermDocs would similarly store nextDeleted and apply the same
"AND NOT" logic.

bq. that's because IndexReader.isDeleted() isn't exposed and because IndexReader.fetchDoc(int
docNum) returns the doc even if it's deleted

Hmm -- that is very nicely enabling.

bq. I've actually been trying to figure out a new design for deletions because writing them
out for big segments is our last big write bottleneck

One approach would be to use a "segmented" model.  IE, if a few
deletions are added, write that to a new "deletes segment", ie a
single "normal segment" would then have multiple deletion files
associated with it.  These would have to be merged (iterator) when
used during searching, and, periodically coalesced.

bq. if we only need iterator access, we can use vbyte encoding instead

Right: if there are relatively few deletes against a segment, encoding
the "on bits" directly (or deltas) should be a decent win since
iteration is much faster.


> BitVector implement DocIdSet
> ----------------------------
>
>                 Key: LUCENE-1476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1476
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Trivial
>         Attachments: LUCENE-1476.patch
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> BitVector can implement DocIdSet.  This is for making SegmentReader.deletedDocs pluggable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message