lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "serkanmulayim@gmail.com"<serkanmula...@gmail.com>
Subject [lucy-user] C library - Scoring mechanism
Date Tue, 21 Nov 2017 01:09:02 GMT
Hi guys,

I have a question regarding the scoring mechanism for relevancy. Is the scoring mechanism
tf/idf when the field indexed with the EasyAnalyzer in the schema? What happens when multiple
terms are used? Are tf/idf's summed? How does the incorporate the location of the words to
the scoring mechanism for queries with multiple words?

How about the fields which has RegexTokenizer? Is it still the same mechanism? Does the type
of the tokenizer affect the scoring?  I believe the important thing is the generated tokens
(and not related to the tokenizer), and maybe the order of the tokens in a document.

One more thing, if I were to change the scoring mechanism for different fields, how can I
do it? Are there any predefined mechanisms eg. tf/idf doc2vec etc. Or if I want to go further
and come up with my own how can I do it?

Thanks,
Serkan



Mime
View raw message