Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 89536 invoked from network); 18 Oct 2007 13:09:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Oct 2007 13:09:37 -0000 Received: (qmail 4492 invoked by uid 500); 18 Oct 2007 13:08:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 4460 invoked by uid 500); 18 Oct 2007 13:08:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 4449 invoked by uid 99); 18 Oct 2007 13:08:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2007 06:08:00 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [217.237.170.35] (HELO mzex01.healy-hudson.com) (217.237.170.35) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2007 13:07:59 +0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.4073 Content-Class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C81187.CB5661D8" Subject: Scoring algorithm suggestion? Date: Thu, 18 Oct 2007 15:07:10 +0200 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Scoring algorithm suggestion? thread-index: AcgRh8tD/KAY25uFQ0qpATqYFkNK3Q== From: "Uwe Goetzke" To: X-Virus-Checked: Checked by ClamAV on apache.org ------_=_NextPart_001_01C81187.CB5661D8 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable We use lucene in our product since version 1.2. I have developed a new Bigramm stemmer and would like to get a suggestion how to implement the needed scorer for it. =20 Using a Boolean query with a slope I get most of the time the correct documents. =20 For example: The Bigramm split for "document" is =20 do oc cu um me en nt =20 If a user searches the misspelled "documnts"=20 =20 I use a Boolean query with a slope depending on the length of the search term. This works quite well , as=20 do oc cu um mn nt ts gives 6 correct terms. =20 But I want to implement in addition that terms which follow each other in the indexed doc in the same order get a higher score. In this case we have 5 terms in the correct order which should give to the doc a boost of 4 (relatively spoken). =20 What type of query should I base the development of my scorer on? =20 =20 Regards Uwe Goetzke development manager ________________________________________________ =20 Healy Hudson GmbH =20 Nelkenstrasse 43 67691 Hochspeyer =20 mailto:uwe.goetzke@healy-hudson.com =20 http://www.healy-hudson.com =20 =20 ----------------------------------------------------------------------- Healy Hudson GmbH - D-55252 Mainz Kastel Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076 Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfanger = sind, durfen Sie die Informationen nicht offen legen oder benutzen. Wenn = Sie diese Email durch einen Fehler bekommen haben, teilen Sie uns dies = bitte umgehend mit, indem Sie diese Email an den Absender = zuruckschicken. Bitte loschen Sie danach diese Email. This email is confidential. If you are not the intended recipient, you = must not disclose or use this information contained in it. If you have = received this email in error please tell us immediately by return email = and delete the document. ------_=_NextPart_001_01C81187.CB5661D8--