lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <>
Subject [jira] [Commented] (LUCENE-6425) Move extractTerms to Weight
Date Wed, 15 Apr 2015 16:40:59 GMT


Adrien Grand commented on LUCENE-6425:

bq.  the plumbing involved there should only happen from FuzzyLikeThisQuery... 

I have tests failing that do not use FuzzyLikeThisQuery. If you apply this patch and comment
out the reset of totalTermFreq then eg. HighlighterSearchTests.testGetFuzzyFragments will
fail. If I understand the issue correctly what happens is that the FuzzyQuery is rewritten
against the main index using SCORING_BOOLEAN_REWRITE which in-turn creates a TermQuery using
the `TermQuery(Term t, TermContext states)` (which sets the docFreq explicitely to 3) and
then for highlighting purposes, the query is executed against the memory index which has a
ttf and a df of 1. So the following line of TermQuery.createWeight is called {{if (docFreq
!= -1) termState.setDocFreq(docFreq);}} and then it breaks an assertion in TermStatistics'
constructor: {{assert totalTermFreq == -1 || totalTermFreq >= docFreq; // #positions must
be >= #postings}} because we overrode the doc freq but did not care to update the total
term freq to a consistent value.

> Move extractTerms to Weight
> ---------------------------
>                 Key: LUCENE-6425
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6425.patch, LUCENE-6425.patch
> Today we have extractTerms on Query, but it is supposed to only be called after the query
has been specialized to a given IndexReader using Query.rewrite(IndexReader) to allow some
complex queries to replace terms "matchers" with actual terms (eg. WildcardQuery).
> However, we already have an abstraction for indexreader-specialized queries: Weight.
So I think it would make more sense to have extractTerms on Weight. This would also remove
the trap of calling extractTerms on a query which is not rewritten yet.
> Since Weights know about whether scores are needed or not, I also hope this would help
improve the extractTerms semantics. We currently have 2 use-cases for extractTerms: distributed
IDF and highlighting. While the former only cares about terms which are used for scoring,
it could make sense to highlight terms that were used for matching, even if they did not contribute
to the score (eg. if wrapped in a ConstantScoreQuery or a BooleanQuery FILTER clause). So
highlighters could do searcher.createNormalizedWeight(query, false).extractTerms(termSet)
to get all terms that were used for matching the query while distributed IDF would instead
do searcher.createNormalizedWeight(query, true).extractTerms(termSet) to get scoring terms

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message