lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Froh <msf...@gmail.com>
Subject Re: Scoring Across Multiple Fields
Date Mon, 27 Jan 2020 22:17:21 GMT
Hi John,

A TermQuery produces a scorer that can compute similarity for a given term
value against a given field, in the context of the index, so as you say, it
produces a score for one field.

If you want to match a given term value across multiple fields, indeed you
could use a BooleanQuery with the TermQueries in SHOULD clauses. The
vanilla BooleanQuery produces a score which is the sum of all matching
clauses' scores (or at least that's the interpretation I get from reading
the source code of the explain() method in BooleanWeight).

You can also look into DisjunctionMaxQuery, which works like a disjunctive
BooleanQuery, but it returns the maximum score across matching clauses. The
idea here is that if, say, you're matching across title and body fields, a
title match may score higher (perhaps because it's been boosted). If you
sum the scores across fields, you're likely just inflating those title
matches even more (since a title match is probably highly correlated with a
body match). (The DisjunctionMaxQuery also has a an optional
"tieBreakerMultiplier" property that you can use to weight the scoring
somewhere between pure max and pure sum -- like "Use the maximum score,
plus 0.001 times the sum of the rest".)

Hope that helps,
Michael

On Mon, 27 Jan 2020 at 13:37, John Brown <brown.john@temple.edu> wrote:

> Hi,
>
> I have a question regarding how Lucene computes document similarities from
> field similarities.
>
> Lucene's scoring documentation mentions that scoring works on fields and
> combines the results to return documents. I'm assuming fields are given
> scores, and those scores are simply averaged to return the document score?
>
> If this is the case, then in order to incorporate multiple fields in my
> scoring, I would use multiple term queries that contain the same term, but
> target different fields, then I would simply put them in a boolean query,
> and search my index using this boolean query.
>
> Am I going about this in the correct way? Any clarification would be
> greatly appreciated.
>
> Thank you,
> John B
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message