lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: Highlighting Redux
Date Fri, 21 Mar 2003 05:05:42 GMT
On Thursday 20 March 2003 11:12, Leander Harding wrote:
> looks at the terms they contain and highlights them all. Consider the
> following query:
> ("foo" AND "bar") OR "baz"
> Suppose that we search using this query and the following document is a
> hit: <doc>Foo.....quux......baz.</doc>
> Which Terms do we highlight?
> All of the existing highlighting code I've seen would highlight both "foo"
> and "baz", but this isn't correct - the document contains "foo", but no
> "bar", thus, since "foo" in the query is part of an AND expression that
> wasn't satisfied by this document, only "baz" should be highlighted.
> So my questions three, are thus:
>     What's the best way to go about this?
>     Has anyone been working on anything similar?
>     Is there already API to make this possible that I'm overlooking?

I think that some of proposed/planned changes would make implementing this
bit easier (see mailing list archives for discussion). However, there is 
slight difficulty in "reverse engineering" 'and' and 'or' relationships 
from query itself (backtracking from Query object trying to see how 
required/prohibited/optional terms form ANDed/ORed groups).
None of proposed solutions would easily give you that grouping information I 

Another similar problem is matching phrase hits; they too can not be simply 
highlighted using just a set of all existing individual terms.

What you probably end up doing is re-building query tree and evaluating 
branches, pruning ones that do not result in hit, then using that (optimized) 
tree for highlighting.
Then again, that (evaluation part) is already done by search functionality... 
so perhaps you could reuse parts?

-+ Tatu +-

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message