Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4537047F7 for ; Mon, 6 Jun 2011 09:06:06 +0000 (UTC) Received: (qmail 40186 invoked by uid 500); 6 Jun 2011 09:06:03 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 40137 invoked by uid 500); 6 Jun 2011 09:06:03 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 40123 invoked by uid 99); 6 Jun 2011 09:06:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jun 2011 09:06:02 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [130.225.24.68] (HELO sbexch03.sb.statsbiblioteket.dk) (130.225.24.68) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jun 2011 09:05:55 +0000 Received: from [130.225.25.23] (130.225.25.23) by sbexch03.sb.statsbiblioteket.dk (130.225.24.68) with Microsoft SMTP Server id 8.3.159.2; Mon, 6 Jun 2011 11:05:33 +0200 Subject: Re: Federated relevance ranking From: Toke Eskildsen Reply-To: te@statsbiblioteket.dk To: "java-user@lucene.apache.org" In-Reply-To: <4DE7E9BC.7000304@hms.harvard.edu> References: <4DE7E9BC.7000304@hms.harvard.edu> Content-Type: text/plain; charset="UTF-8" Organization: State and University Library, Denmark Date: Mon, 6 Jun 2011 11:05:33 +0200 Message-ID: <1307351133.766.299.camel@te-prime> MIME-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 7bit On Thu, 2011-06-02 at 21:51 +0200, Clint Gilbert wrote: > We're also considering a home-grown scheme involving normalizing the > denominators of all the index components in all our indices, based on > the sums of counts obtained from all the indices. This feels like > re-inventing the wheel, and it's not clear to me yet that the low-level > manipulation of indices that we'd need to do is even possible. We're currently experimenting with this approach, albeit only for two searchers. Since we have very little control of the secondary searcher, just a basic search-API, we're really hacking and performing a query rewrite based on term statistics. This only works for basic term queries (no wildcards, ranges etc.), but fortunately our search logs show that they are by far the most common. The math is not too bad: Extract occurrence counts for the terms, sum them, calculate the difference when sending a request to a specific searcher and set a term boost in the textual query, so that the standard ranking formula in Lucene will yield the desired score. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org