lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <ysee...@gmail.com>
Subject Re: Skewed IDF in multi lingual index, again
Date Tue, 05 Dec 2017 12:38:34 GMT
On Tue, Dec 5, 2017 at 5:15 AM, alessandro.benedetti
<a.benedetti@sease.io> wrote:
> "Lucene/Solr doesn't actually delete documents when you delete them, it
> just marks them as deleted.  I'm pretty sure that the difference between
> docCount and maxDoc is deleted documents.  Maybe I don't understand what
> I'm talking about, but that is the best I can come up with. "
>
> Thanks Shawn, yes, that is correct and I was aware of it.
> I was curious of another difference :
> I think we confirmed that docCount is local to the field ( thanks Yonik for
> that) so :
>
> docCount(index,field1)= # of documents in the index that currently have
> value(s) for field1
>
> My question is :
>
> maxDocs(index,field1)= max # of documents in the index that had value(s) for
> field1
>
> OR
>
> maxDocs(index)= max # of documents that appeared in the index ( field
> independent)

The latter.
I imagine that's why docCount was introduced (to avoid changing the
meaning of an existing term).
FWIW, the scoring change was made in
https://issues.apache.org/jira/browse/LUCENE-6711 for Lucene/Solr 6.0

-Yonik

Mime
View raw message