lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: How can I use QueryScorer() to find only perfect matches??
Date Thu, 18 Mar 2010 12:51:17 GMT
Unfortunately, highlighter (and I think also fast vector highlighter)
are able to return a set of fragments which do not match the
query (eg, they only show one of the two required terms).

I really don't like that they do this.

Ideally (to me) the entire excerpt (ie, all fragments appended
together) should match the original query.  Meaning I see at least one
occurrence of each required term (the occurrence of each could occur
in different fragments).

Progress has been made in general -- eg it use to be the case that if
you highlighted a phrase query, eg "president obama", you could see
excerpts that only had one of the words.  That's been fixed by
defaulting to QueryScorer.

To really fix this for all queries is not easy...  there was a long
discussion, here:

I think we should improve the Scorer API so that it can optionally
provide positional details of all matches, probably by absorbing
Span*Query back into their non-span counterparts and enriching the
API.  But this is a biggish change.

Maybe as a stopgap you could pull many fragments from highlighter and
then pick a set of fragments that cover the most unique terms...?
Sort like a coord factor, but for highlighting not BooleanQuery.  Is
it only required clauses you need to fix?


On Thu, Mar 18, 2010 at 5:43 AM, chris.stodola <> wrote:
> Hi Erick,
> I did as recommended and changed the query approprietly. But the result is
> still the same.
> On page 78 in the book "lucene in action" it is explained how scoring is
> working. Therefore I get more results than the exact match I was expecting.
> But how can I highlight in a large document only the results identified by a
> certain query like +contents:term +contents:query?
> Are there any alternatives to the QueryScore method? any examples? any
> papers to read first?
> thx
> christian
> Erick Erickson wrote:
>> Try +contents:term +contents:query. By misplacing the
>> '+' you're getting the default OR operator and the '+'
>> is probably being thrown away by the analyzer.
>> Luke will help here a lot.
>> HTH
>> Erick
>> On Mon, Mar 15, 2010 at 9:46 AM, christian stadler
>> <
>>> wrote:
>>> Hi there,
>>> I have an issue with the QueryScorer(query) method at the moment and I
>>> need
>>> some assistance.
>>> I was indexing my e-book "lucene in action" and based on this index-db I
>>> started to play around with some boolean queries like:
>>> (contents:+term contents:+query)
>>> As a result I'm expecting as a perfect match for the phrase "term query"
>>> four
>>> hits.
>>> But when I run my sample to highlight this phrase in the context then I
>>> get
>>> a
>>> lot more results. It also finds all the matches for "term" and "query"
>>> independently.
>>> I think the problem is the QueryScorer() which softens the former exact
>>> boolean
>>> query.
>>> Then I was trying the following:
>>> private static Highlighter GetHits(Query query, Formatter formatter)
>>> {
>>>    string filed = "contents"
>>>    BooleanQuery termsQuery = new BooleanQuery();
>>>    WeightedTerm[] terms = QueryTermExtractor.GetTerms(query, true,
>>> field);
>>>    foreach (WeightedTerm term in terms)
>>>    {
>>>        TermQuery termQuery = new TermQuery(new Term(field,
>>> term.GetTerm()));
>>>        termsQuery.Add(termQuery, BooleanClause.Occur.MUST);
>>>    }
>>>    // create query scorer based on term queries (field specific)
>>>    QueryScorer scorer = new QueryScorer(termsQuery);
>>>    Highlighter highlighter = new Highlighter(formatter, scorer);
>>>    highlighter.SetTextFragmenter(new SimpleFragmenter(20));
>>>    return highlighter;
>>> }
>>> to rewrite the query and set the term attribute from SHOULD to MUST
>>> But the result was the same.
>>> Do you have any example how I can use the QueryScorer() in exactly the
>>> same
>>> way
>>> as to mimic a BooleanSearch??
>>> thanks in advance
>>> Christian
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
> --
> View this message in context:
> Sent from the Lucene - Java Developer mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message