lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: Bad fieldNorm when using morphologic synonyms
Date Fri, 06 Dec 2013 07:48:21 GMT
Hi Isaac,

Did you consider omitting norms completely for that field? omitNorms="true"
Are you using solr.RemoveDuplicatesTokenFilterFactory?



On Thursday, December 5, 2013 8:55 PM, Isaac Hebsh <isaac.hebsh@gmail.com> wrote:
 
Hi,
we implemented a morphologic analyzer, which stems words on index time.
For some reasons, we index both the original word and the stem (on the same
position, of course).
The stemming is done on a specific language, so other languages are not
stemmed at all.

Because of that, two documents with the same amount of terms, may have
different termVector size. document which contains many words that being
stemmed, will have a double sized termVector. This behaviour affects the
relevance score in a BAD way. the fieldNorm of these documents reduces
thier score. This is NOT the wanted behaviour in our case.

We are looking for a way to "mark" the stemmed words (on index time, of
course) so they won't affect the fieldNorm. Do such a way exist?

Do you have another idea?
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message