lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Leander Harding" <lhard...@mitre.org>
Subject Highlighting Redux
Date Thu, 20 Mar 2003 18:12:30 GMT
Hi,

    Yes, it's another question about Term highlighting. Essentially, what
I'm looking to is obtain a set of term positions in a given document that
are hits for a given Query. I've read the archives and looked at the
contributed code, but it all fails in one important (to my employer)
respect: it doesn't understand the semantics of Lucene queries, rather it
looks at the terms they contain and highlights them all. Consider the
following query:
("foo" AND "bar") OR "baz"
Suppose that we search using this query and the following document is a hit:
<doc>Foo.....quux......baz.</doc>
Which Terms do we highlight?
All of the existing highlighting code I've seen would highlight both "foo"
and "baz", but this isn't correct - the document contains "foo", but no
"bar", thus, since "foo" in the query is part of an AND expression that
wasn't satisfied by this document, only "baz" should be highlighted.
So my questions three, are thus:
    What's the best way to go about this?
    Has anyone been working on anything similar?
    Is there already API to make this possible that I'm overlooking?
My immediate reaction is to simply grab the source for lucene and start
hacking, but before I do that I'd be interested know how what the likelihood
of the resulting changes getting rolled into the main source tree reasonably
quickly - copyright shouldn't be a problem, but my employer has been
reluctant to let me just go this route for fear of ending up supporting a
patch themselves.

   Peace,
        -Leander

Leander Harding | lharding@mitre.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message