lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Scoring algorithm suggestion?
Date Fri, 19 Oct 2007 02:19:13 GMT
I don't have the answer to your main question, but will point you to the ngram set of tokenizers
in Lucene's contrib/, in case you want to use that instead of maintaining your own bi-gram

 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy --  -  Tag  -  Search  -  Share

----- Original Message ----
From: Uwe Goetzke <>
Sent: Thursday, October 18, 2007 9:07:10 AM
Subject: Scoring algorithm suggestion?

We use lucene in our product since version 1.2.

I have developed a new  Bigramm stemmer and would like to get a
suggestion how to implement the needed scorer for it.


Using a Boolean query with a slope I get most of the time  the correct


For example: The Bigramm split for "document"  is


do oc cu um me en nt


If a user searches the misspelled "documnts" 


I use a Boolean query with a slope depending on the length of the

This works quite well , as 

do oc cu um mn nt ts

gives 6 correct terms.


But I want to implement in addition that terms which follow each other
in the indexed doc in the same order get a higher score.

In this case we have 5 terms in the correct order which should give to
the doc a boost of 4 (relatively spoken).


What type of query should I base the  development of my scorer on?




Uwe Goetzke

development manager



Healy Hudson GmbH  

Nelkenstrasse 43

67691 Hochspeyer
<> <> 


Healy Hudson GmbH - D-55252 Mainz Kastel
Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076

Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfanger
 sind, durfen Sie die Informationen nicht offen legen oder benutzen.
 Wenn Sie diese Email durch einen Fehler bekommen haben, teilen Sie uns
 dies bitte umgehend mit, indem Sie diese Email an den Absender
 zuruckschicken. Bitte loschen Sie danach diese Email.
This email is confidential. If you are not the intended recipient, you
 must not disclose or use this information contained in it. If you have
 received this email in error please tell us immediately by return email
 and delete the document.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message