From java-user-return-64735-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Mon Jan 27 22:17:43 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id B835918060E for ; Mon, 27 Jan 2020 23:17:42 +0100 (CET) Received: (qmail 73255 invoked by uid 500); 27 Jan 2020 22:17:41 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 73239 invoked by uid 99); 27 Jan 2020 22:17:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jan 2020 22:17:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 588BC180643 for ; Mon, 27 Jan 2020 22:17:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Zjxt6CtldvJh for ; Mon, 27 Jan 2020 22:17:39 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.166.50; helo=mail-io1-f50.google.com; envelope-from=msfroh@gmail.com; receiver= Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 02A8EBC510 for ; Mon, 27 Jan 2020 22:17:38 +0000 (UTC) Received: by mail-io1-f50.google.com with SMTP id z8so11945849ioh.0 for ; Mon, 27 Jan 2020 14:17:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=htjgbUFVMcLapgicH45pvZt7Z9L9LRJTQYFVWi28Al8=; b=UzKO8W7PN3YahnJB+bbMAmQPYM1GA1ZhKlb4odIBDYepKQ7/8kQXU9+kMZKjsjuZ+1 Lz0KDDngMsg4GnGXAfArTWC8Hi8K+KVrVshc6XJVBDNZ/YdDSlGXd9PGNEgQbo48Rj6G 6YeSi7/4avFcbzx8zd+Oo+ralzvGnY0Auy6LqOzpYyl54gA+8B4P9mVTXDsG230bF60U caeBcksAaalRXyaB2poF2KnWAHSxykOA9SUoVoTdeQm3Ua5fI2HbKLceHy70hAtOMXRc vGJ1ubBICSpfL+TE88BD5jt516IhcXLs5M2M6IXyeBs38hbplDgIavXQS2pPbhgAjGp4 NGhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=htjgbUFVMcLapgicH45pvZt7Z9L9LRJTQYFVWi28Al8=; b=q7B47TY+d7/CvGSiR4dCaWNcr5Vx/KJQ6MERIbWvA0UKy792gqVgvpMqIznljn0Zf1 VE6IJaDFZDMpEJAWpHBpfkvQ/PYoFEj/oqq12Bmf+cb0ScVQsW4RrxZH8ZaU6PRjyT0x 9MZ3hsijfAogsZVc6W7ZbHt0lKxB/PE+yWG6Za//NvU1F18lhwTIioscLRgfwbYJmZn8 xmtKIoZLIUmSITC5fJoTjwr8KIsCh+p9qG335JbH1mGVeuCOc8RQfkUZWs/GFVbr31Ta dU8oXHwYHnDOK3spMtbi51lMjl5mzS8ndhBbK/Abpcp09WZONlofBM7OM1rDop2gVVav hE6w== X-Gm-Message-State: APjAAAU3+lfQvsouC/oS6UNwGfU3gy57j2iEc9k+/Jxm+UzxYGarkmPi aCt+w31oLKEZKKxPbZd5qSf7tQObUNOJJwZRxbhUjw== X-Google-Smtp-Source: APXvYqxFC7ooSLck7b+2U6t4DMjRzZXJ2mz6X4QiLoTJTSSE4amA2GwGPmdNkUJJkizrj6YVrmkioJ/bYJSU5jpAmio= X-Received: by 2002:a02:4d8:: with SMTP id 207mr4870969jab.34.1580163452025; Mon, 27 Jan 2020 14:17:32 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Michael Froh Date: Mon, 27 Jan 2020 14:17:21 -0800 Message-ID: Subject: Re: Scoring Across Multiple Fields To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary="0000000000003ebf5f059d2679cc" --0000000000003ebf5f059d2679cc Content-Type: text/plain; charset="UTF-8" Hi John, A TermQuery produces a scorer that can compute similarity for a given term value against a given field, in the context of the index, so as you say, it produces a score for one field. If you want to match a given term value across multiple fields, indeed you could use a BooleanQuery with the TermQueries in SHOULD clauses. The vanilla BooleanQuery produces a score which is the sum of all matching clauses' scores (or at least that's the interpretation I get from reading the source code of the explain() method in BooleanWeight). You can also look into DisjunctionMaxQuery, which works like a disjunctive BooleanQuery, but it returns the maximum score across matching clauses. The idea here is that if, say, you're matching across title and body fields, a title match may score higher (perhaps because it's been boosted). If you sum the scores across fields, you're likely just inflating those title matches even more (since a title match is probably highly correlated with a body match). (The DisjunctionMaxQuery also has a an optional "tieBreakerMultiplier" property that you can use to weight the scoring somewhere between pure max and pure sum -- like "Use the maximum score, plus 0.001 times the sum of the rest".) Hope that helps, Michael On Mon, 27 Jan 2020 at 13:37, John Brown wrote: > Hi, > > I have a question regarding how Lucene computes document similarities from > field similarities. > > Lucene's scoring documentation mentions that scoring works on fields and > combines the results to return documents. I'm assuming fields are given > scores, and those scores are simply averaged to return the document score? > > If this is the case, then in order to incorporate multiple fields in my > scoring, I would use multiple term queries that contain the same term, but > target different fields, then I would simply put them in a boolean query, > and search my index using this boolean query. > > Am I going about this in the correct way? Any clarification would be > greatly appreciated. > > Thank you, > John B > --0000000000003ebf5f059d2679cc--