lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rene Hackl-Sommer <>
Subject Re: "Deleting" documents without deleting them
Date Tue, 16 Mar 2010 16:59:55 GMT
I cannot comment on the "marked-as-deleted" documents, but for the 
approach I outlined: this might impact the scores. I prefer to say 
'impact' instead of 'skew', because to me 'skew' would imply that the 
original scores are some kind of ideal state which is distorted. I don't 
think this is necessarily the case with term weight shifts.

It really depends on the specific setup. If there are millions of 
documents in the index, and some of them are in there ten times and 
others a hundred times in terms of their contribution to statistical 
figures (not real physical multiple instances), I don't think this would 
lead to a significant change overall. With a large index, I would be 
surprised if this would affect precision by something drastic, say 5%.

And if marginal shifts are troublesome, you can always maintain two 
indexes: one with all the document versions for reference if required 
and the other one with only the current documents for everyday searches.


Am 16.03.2010 14:05, schrieb TCK:
> Wouldn't these excluded/filtered documents skew the scores even though they
> are supposed to be marked as deleted? Don't the idf values used in scoring
> depend on the entire document set and not just the matching hits for a
> query?
> Thanks,
> On Tue, Mar 16, 2010 at 5:45 AM, Rene Hackl-Sommer<>wrote:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message