lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Query.extractTerms - a poor introspection API?
Date Thu, 06 Apr 2006 18:33:32 GMT
> It's still the case that you often need to know what
> type of query the
> parent is.

For highlighting purposes I typically don't need/want
to concern myself too much with precisely interpreting
the specifics of all Query logic:
* For Boolean queries the "mustNot" terms typically
don't appear in the field being highlighted.  
* Phrase/span queries present additional complication
when summarising docs and selecting fragments of text
so I don't attempt to deal with them.

For these reasons the highlighter doesn't truly
reflect the actual matches made by the Query - it's
simply too hard to balance support for summarization
AND complex nested boolean logic AND complex proximity
tests (ie spans).
However, using just terms, term query boosts and term
IDF the highlighter makes a reasonable approximation
of representing matches. The 2 Query methods I
proposed (getQueries/getTerms) would be sufficient to
support this model.

> IMO, the query hierarchy should be fully
> self-describable... user code
> should be able to walk it

Is this not already the case? Don't most query objects
provide some ability to view contained subqueries eg
BooleanQuery.getClauses()? This obviously requires the
client to have prior knowledge of all the specific
container types to traverse them.

What we are considering here is what might be a
generally useful level of abstraction which allows
code to traverse Query hierarchies without specialized
knowledge of all the container query types.

For the purposes of highlighting I would suggest the
API I have outlined is sufficient given the challenges
I mentioned earlier but I'd be interested to hear what
others might need from such an API if we were to
consider adding it. 


Yahoo! Photos – NEW, now offering a quality print service from just 8p a photo

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message