lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Isaac Hebsh <>
Subject Bad fieldNorm when using morphologic synonyms
Date Thu, 05 Dec 2013 18:53:11 GMT
we implemented a morphologic analyzer, which stems words on index time.
For some reasons, we index both the original word and the stem (on the same
position, of course).
The stemming is done on a specific language, so other languages are not
stemmed at all.

Because of that, two documents with the same amount of terms, may have
different termVector size. document which contains many words that being
stemmed, will have a double sized termVector. This behaviour affects the
relevance score in a BAD way. the fieldNorm of these documents reduces
thier score. This is NOT the wanted behaviour in our case.

We are looking for a way to "mark" the stemmed words (on index time, of
course) so they won't affect the fieldNorm. Do such a way exist?

Do you have another idea?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message