lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Scoring algorithm suggestion?
Date Fri, 19 Oct 2007 02:19:13 GMT
Uwe,
I don't have the answer to your main question, but will point you to the ngram set of tokenizers
in Lucene's contrib/, in case you want to use that instead of maintaining your own bi-gram
tokenizer.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Uwe Goetzke <uwe.goetzke@healy-hudson.com>
To: java-user@lucene.apache.org
Sent: Thursday, October 18, 2007 9:07:10 AM
Subject: Scoring algorithm suggestion?


We use lucene in our product since version 1.2.

I have developed a new  Bigramm stemmer and would like to get a
suggestion how to implement the needed scorer for it.

 

Using a Boolean query with a slope I get most of the time  the correct
documents.

 

For example: The Bigramm split for "document"  is

 

do oc cu um me en nt

 

If a user searches the misspelled "documnts" 

 

I use a Boolean query with a slope depending on the length of the
 search
term.

This works quite well , as 

do oc cu um mn nt ts

gives 6 correct terms.

 

But I want to implement in addition that terms which follow each other
in the indexed doc in the same order get a higher score.

In this case we have 5 terms in the correct order which should give to
the doc a boost of 4 (relatively spoken).

 

What type of query should I base the  development of my scorer on?

 

 

Regards

Uwe Goetzke

development manager

________________________________________________

 

Healy Hudson GmbH  

Nelkenstrasse 43

67691 Hochspeyer

  

mailto:uwe.goetzke@healy-hudson.com
<mailto:uwe.goetzke@healy-hudson.com> 

http://www.healy-hudson.com <http://www.healy-hudson.com/> 

 


-----------------------------------------------------------------------
Healy Hudson GmbH - D-55252 Mainz Kastel
Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076

Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfanger
 sind, durfen Sie die Informationen nicht offen legen oder benutzen.
 Wenn Sie diese Email durch einen Fehler bekommen haben, teilen Sie uns
 dies bitte umgehend mit, indem Sie diese Email an den Absender
 zuruckschicken. Bitte loschen Sie danach diese Email.
This email is confidential. If you are not the intended recipient, you
 must not disclose or use this information contained in it. If you have
 received this email in error please tell us immediately by return email
 and delete the document.





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message