lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject Skewed IDF in multi lingual index, again
Date Thu, 30 Nov 2017 16:14:38 GMT
Hello,

We already discussed this problem five years ago [1]. In short: documents in foreign languages
are scored higher for some terms.

It was solved back then by using docCount instead of maxDoc when calculating idf, it worked
really well! But, probably due to index changes, the problem is back for some terms, mostly
proper nouns, well, just like five years ago.

We already deboost documents by 0.7 that are not in the user's preference language but in
some cases it is not enough. I can go on by reducing that boost but that's not what i prefer.

I'd like to know if there are additional tricks to solve the problem.

Many thanks!
Markus

[1] http://lucene.472066.n3.nabble.com/Skewed-IDF-in-multi-lingual-index-td4019095.html

Mime
View raw message