lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: IndexWriter.deleteDocuments(Query query)
Date Thu, 02 Apr 2009 19:35:32 GMT
On Thu, Apr 2, 2009 at 2:26 PM, John Wang <> wrote:
> Hi Michael:
>    Thanks for looking into this.
>    Approach 2 has a dependency on how fast the delete set performs a check
> on a given id, approach one doesn't. After replacing my delete set with a
> simple bitset, approach 2 gets a 25-30% improvement.


>   I understand if the delete set is small, approach 1 would be faster,
> while approach two has a more constant/deterministic performance. I would
> also save from indexing the UID term into the index if going with approach
> two.


>   I don't however see how column-stride fields would help here, isn't it a
> generalization of what I am doing?

Sorry, yes, and I shouldn't have said "much faster".  What I'm
picturing with column stride fields is that you'd be able to load an
int[] per segment, mapping docID -> UID.  That load may be faster than
the decode process you do now, though probably not that much faster.
If we do the inverted column stride field, then you'd have an array
mapping UID -> docID and then should be faster (load time'd be the
same, but you could then visit only the deleted UIDs instead of
sweeping all docs).

>  BTW, can you shine some light on why would IndexWriter move docids around
> when it is opened and no docs has been added to it?

Actually, sorry, I'm wrong about this: in IndexWriter.init, we don't
actually kick off merges.

Though I don't think it's safe to rely on that (Lucene could someday,
eg if index was closed with close(false) then it may need merging on


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message