lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: Designing a multilingual index
Date Tue, 03 Jan 2012 15:19:11 GMT
On Tue, Jan 3, 2012 at 10:10 AM, Paul Libbrecht <> wrote:
> I think the idf is also about terms and not about tokens.
> Maybe an expert can confirm my belief or we have to invent a test.

idf is docFreq and maxDoc.

docFreq is per-field, maxDoc is not. This might not even matter though.

if you are concerned about it in a situation where you have multiple
languages in different fields and some are sparse, you can look at
lucene's trunk, which has a "per-field maxdoc" (Terms.docCount), which
is the count of all documents that have at least one indexed term for
the field.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message