lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Grzebyta <grzebyta....@gmail.com>
Subject Penalize fact the searched term is within a world
Date Thu, 08 Jun 2017 11:56:30 GMT
Hi,

Apologies for repeating question from IRC room but I am not sure if that is
alive.

I have no idea about how lucene works but I need to modify some part in
rdf4j project which depends on that.

I need to use lucene to create a mapping file based on text searching and I
found there is a following problem. Let take a term 'abcd' which is mapped
to node 'abcd-2' whereas node 'abcd' exists. I found the issue is lucene is
searching the term and finds it in both nodes 'abcd' and 'abcd-2' and gives
the same score. My question is: how to modify the scoring to penalise the
fact the searched term is a part of longer word and give more score if that
is itself a word.

Visually It looks like that:

node 'abcd':
  - name: abcd

total score = LS /lucene score/ * 2.0 /name weight/



node 'abcd-2':
   - name: abcd-2
   - alias1: abcd-h
   - alias2: abcd-k9

total score = LS * 2.0 + LS * 0.5 /alias1 score/ + LS * 0.1 /alias2 score/

I gave different weights for properties. "Name" has the the highest weight
but "alias" has some small weight as well. In total the score for a node is
a sum of all partial score * weight. Finally 'abcd-2' has highest score
than 'abcd'.

thanks,
Jacek

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message