lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: Query Term Collector (was: Re: New highlighter package available)
Date Mon, 06 Oct 2003 03:24:42 GMT
On Sunday 05 October 2003 03:15, wrote:
> Here are some very important reasons why getTerms() shouldn't be added as a
> method to Query:
> Query objects are seen by Lucene users as reusable objects.

I think that is a good point from design/architecture perspective.

> These query types are important distinctions to preserve and the getTerms()
> proposal doesn't respect these subtle differences in query usage.

It'd good to keep in mind, though, that it's possible to implement this 
without requiring queries to have state. Term collector's state could be 
passed either when query is actually executed, or as a separate step. Former 
would be more optimal, in the sense that Terms can probably be collected only 
once; second would allow cleaner separation.

When this was discussed last time, I suggested that perhaps overhead of the 
second pass is general not a huge issue, mostly since highlighting is usually 
only done for one document. But more importantly, it would be good to measure 
exactly how long separate Term collection phase would take for some realistic 
index and matching queries, and compare that to actual query execution time.
And then decide if the performance overhead of double scanning of Terms is 
significant to be an issue.

-+ Tatu +-

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message