lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <>
Subject [jira] [Commented] (LUCENE-8311) Leverage impacts for phrase queries
Date Tue, 22 May 2018 12:53:00 GMT


Adrien Grand commented on LUCENE-8311:

Unfortunately I don't think this is due to this scoring issue, but rather to the fact that
a single position of a given term is allowed to be part of several matches in sloppy phrases.
For instance if the query is {{"the fox"~4}}, and {{the}} and {{fox}} have respective term
frequencies of 5 and 1. Then we can assume that the maximum frequency is 1 for an exact phrase
(the min of both freqs). But if the query is a sloppy phrase query, we could have a frequency
of 4 if a document has 5 occurrences of {{the}} at position N (as synonyms of each other)
and 1 occurrence of {{fox}} at position {{N+1}}. Yet such documents that trigger the maximum
frequency do not exist in practice, which causes the score upper bounds that we compute to
be significantly higher than the scores that are computed in practice, so no blocks of documents
are ever skipped because their score is not competitive.

> Leverage impacts for phrase queries
> -----------------------------------
>                 Key: LUCENE-8311
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8311.patch
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for each unique
norm value in order to get upper bounds of the score for the phrase.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message