Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Content-Class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C81187.CB5661D8"
Subject: Scoring algorithm suggestion?
Date: Thu, 18 Oct 2007 15:07:10 +0200
Message-ID: <D7206503EB5BE94497458AF4FE4AE8AB453B2E@mzex01.healy-hudson.com>
Thread-Topic: Scoring algorithm suggestion?
thread-index: AcgRh8tD/KAY25uFQ0qpATqYFkNK3Q==
From: "Uwe Goetzke" <uwe.goetzke@healy-hudson.com>
To: <java-user@lucene.apache.org>

------_=_NextPart_001_01C81187.CB5661D8
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

We use lucene in our product since version 1.2.

I have developed a new  Bigramm stemmer and would like to get a
suggestion how to implement the needed scorer for it.

=20

Using a Boolean query with a slope I get most of the time  the correct
documents.

=20

For example: The Bigramm split for "document"  is

=20

do oc cu um me en nt

=20

If a user searches the misspelled "documnts"=20

=20

I use a Boolean query with a slope depending on the length of the search
term.

This works quite well , as=20

do oc cu um mn nt ts

gives 6 correct terms.

=20

But I want to implement in addition that terms which follow each other
in the indexed doc in the same order get a higher score.

In this case we have 5 terms in the correct order which should give to
the doc a boost of 4 (relatively spoken).

=20

What type of query should I base the  development of my scorer on?

=20

=20

Regards

Uwe Goetzke

development manager

________________________________________________

=20

Healy Hudson GmbH =20

Nelkenstrasse 43

67691 Hochspeyer

 =20

mailto:uwe.goetzke@healy-hudson.com
<mailto:uwe.goetzke@healy-hudson.com>=20

http://www.healy-hudson.com <http://www.healy-hudson.com/>=20

=20


-----------------------------------------------------------------------
Healy Hudson GmbH - D-55252 Mainz Kastel
Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076

Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfanger =
sind, durfen Sie die Informationen nicht offen legen oder benutzen. Wenn =
Sie diese Email durch einen Fehler bekommen haben, teilen Sie uns dies =
bitte umgehend mit, indem Sie diese Email an den Absender =
zuruckschicken. Bitte loschen Sie danach diese Email.
This email is confidential. If you are not the intended recipient, you =
must not disclose or use this information contained in it. If you have =
received this email in error please tell us immediately by return email =
and delete the document.


------_=_NextPart_001_01C81187.CB5661D8--