lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Designing a multilingual index
Date Tue, 03 Jan 2012 15:19:11 GMT
On Tue, Jan 3, 2012 at 10:10 AM, Paul Libbrecht <paul@hoplahup.net> wrote:
> I think the idf is also about terms and not about tokens.
> Maybe an expert can confirm my belief or we have to invent a test.
>

idf is docFreq and maxDoc.

docFreq is per-field, maxDoc is not. This might not even matter though.

if you are concerned about it in a situation where you have multiple
languages in different fields and some are sparse, you can look at
lucene's trunk, which has a "per-field maxdoc" (Terms.docCount), which
is the count of all documents that have at least one indexed term for
the field.

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message