lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Goetzke" <>
Subject Scoring algorithm suggestion?
Date Thu, 18 Oct 2007 13:07:10 GMT
We use lucene in our product since version 1.2.

I have developed a new  Bigramm stemmer and would like to get a
suggestion how to implement the needed scorer for it.


Using a Boolean query with a slope I get most of the time  the correct


For example: The Bigramm split for "document"  is


do oc cu um me en nt


If a user searches the misspelled "documnts" 


I use a Boolean query with a slope depending on the length of the search

This works quite well , as 

do oc cu um mn nt ts

gives 6 correct terms.


But I want to implement in addition that terms which follow each other
in the indexed doc in the same order get a higher score.

In this case we have 5 terms in the correct order which should give to
the doc a boost of 4 (relatively spoken).


What type of query should I base the  development of my scorer on?




Uwe Goetzke

development manager



Healy Hudson GmbH  

Nelkenstrasse 43

67691 Hochspeyer
<> <> 


Healy Hudson GmbH - D-55252 Mainz Kastel
Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076

Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfanger sind, durfen Sie die
Informationen nicht offen legen oder benutzen. Wenn Sie diese Email durch einen Fehler bekommen
haben, teilen Sie uns dies bitte umgehend mit, indem Sie diese Email an den Absender zuruckschicken.
Bitte loschen Sie danach diese Email.
This email is confidential. If you are not the intended recipient, you must not disclose or
use this information contained in it. If you have received this email in error please tell
us immediately by return email and delete the document.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message