Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B594610F6A for ; Thu, 22 Oct 2015 18:30:13 +0000 (UTC) Received: (qmail 2218 invoked by uid 500); 22 Oct 2015 18:30:11 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 2160 invoked by uid 500); 22 Oct 2015 18:30:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 2149 invoked by uid 99); 22 Oct 2015 18:30:11 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Oct 2015 18:30:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 075E3C6126 for ; Thu, 22 Oct 2015 18:30:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.003 X-Spam-Level: * X-Spam-Status: No, score=1.003 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, MSGID_FROM_MTA_HEADER=0.001, RP_MATCHES_RCVD=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id oRPFS1OzlREQ for ; Thu, 22 Oct 2015 18:30:06 +0000 (UTC) Received: from mail10.mayo.edu (mail10.mayo.edu [129.176.114.198]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTP id E8CB423033 for ; Thu, 22 Oct 2015 18:30:05 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.20,183,1444712400"; d="scan'208";a="57534854" Received: from roedlp005a.mayo.edu (HELO mail10.mayo.edu) ([10.146.65.140]) by ironport10-dlp.mayo.edu with ESMTP; 22 Oct 2015 13:29:59 -0500 Message-Id: X-IronPort-AV: E=Sophos;i="5.20,183,1444712400"; d="scan'208";a="57534853" Received: from essexmb.mayo.edu (HELO msgoms03.mayo.edu) ([10.128.209.12]) by ironport10.mayo.edu with ESMTP; 22 Oct 2015 13:29:59 -0500 Date: Thu, 22 Oct 2015 18:29:57 +0000 From: "Bauer, Herbert S. (Scott)" Subject: Re: Scoring over Multiple Indexes In-reply-to: To: "java-user@lucene.apache.org" Content-id: MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-language: en-US Content-transfer-encoding: quoted-printable Accept-Language: en-US Thread-topic: Scoring over Multiple Indexes Thread-index: AQHRDN8tV/G0Ryi5VUW/4wo2if13GZ53+u2A///anoA= Old-x-esetresult: clean, is OK Old-x-esetid: 04D1663E7244163655923E X-MS-Has-Attach: X-MS-TNEF-Correlator: X-EsetResult: clean, is OK X-EsetId: 864C653E7A003136D70F3D References: User-Agent: Microsoft-MacOutlook/14.4.4.140807 X-CFilter-Loop: Reflected Thanks for your reply. We=B9ve recently moved from a single large index to multiple indexes. Given that the content loaded for these indexes represents individually curated terminologies, I think we can argue to our users that what comes from combined queries over the latter is as meaningful in it=B9s own right as those run over the monolithic index. We had to consider that our changes to the back end of our application might change sorting orders for results which is what we normally want to avoid. =20 On 10/22/15, 10:43 AM, "Erick Erickson" wrote: >In a word, no. At least not that I've heard of. "normalizing scores" >is one of those things >that sounds reasonable on the surface, but is really meaningless. >Scores don't really >_tell_ you anything about the abstract "goodness" of a doc, they just >tell you that >doc1 is likely better than doc2 _within a single query_. You can't even >compare >scores in the _same_ index across two different queries. > >At its lowest level, say one index has 1,000,000 occurrences of >"erick", while index 2 has >exactly 1. Term frequency is one of the numbers that is used to >calculate the score. >How does one normalize the part of the calculation resulting from >matching "erick" >between the two indexes? Anything you do is wrong. > >Similarly, expecting documents to be returned in a particular order >because of boosting >is not going to be satisfactory. Boosting will influence the final >score and thus the >position of the document, but not absolutely order them unless you put >in insane boosts. >Tests based on boosting and doc ordering will be very fragile I'd guess. > >Best, >Erick > >On Thu, Oct 22, 2015 at 8:34 AM, Bauer, Herbert S. (Scott) > wrote: >> We have a test case that boosts a set of terms. Something along the >>lines of =B3term1^2 AND term2^3 AND term3^4 and this query runs over a tw= o >>content distinct indexes. Our expectation is that the terms would be >>returned to us as term3, term2 and term1. Instead we get something >>along the lines of term3, term1 and term2. I realize from a number of >>postings that this is the result of the scoring methods action taking >>place within an individual index rather than against several indexes. >>At the same time I don=B9t see a lot of solutions offered. Is there an ou= t >>of the box solution to normalize scoring over diverse indexes? If not >>is there a strategy for rolling your own normalizing solution? I=B9m >>assuming this has to be a common problem. -scott >> > >--------------------------------------------------------------------- >To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org