lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Improving TimeLimitedCollector
Date Wed, 24 Jun 2009 08:55:50 GMT
But TimeLimitingCollector's logic is coded in its collect() method. The top
scorer calls nextDoc() or advance() on all its sub-scorers, and only when a
match is found it calls collect().

If we want the sub-scorers to check whether they should abort, we'd need to
revamp (liked the word :)) TimeLimitingCollector, to be something like
CheckAbort SegmentMerger uses. I.e., the top scorer will pass such an
instance to its sub scorers, which will call a TimeLimit.check() or
something and if the time limit has expired this call will throw a
TimeExceededException (like TLC).

We can enable this by adding another parameter to IndexSearcher whether
searches should be limited by time, and what's the time limit. It will then
instantiate that object and pass it to its Scorer and so on. I think such a
parameter should not exist on individual search methods since it's more of a
global setting (i.e., I want my searches to be limited to 5 seconds, always,
not just for a particular query). Right?

Another option would be to add a setTimeout method on Query, which will use
it when it constructs its Scorer. The shortcoming of this is that if I want
to use someone else's query which did not implement setTimeout, then I'll
need to build a TimeOutQueryWrapper that will wrap a Query, and implement
the timeout logic, but that's get complicated.

I think the Collector approach makes the most sense to me, since it's the
only object I fully control in the search process. I cannot control Query
implementations, and I cannot control the decisions made by IndexSearcher.
But I can always wrap someone else's Collector with TLC and pass it to
search().

Shai

On Wed, Jun 24, 2009 at 12:26 AM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> As we're revamping collectors, weights, and scorers, perhaps we
> can push time limiting into the individual subscorers? Currently
> on a boolean query, we're timing out the query at the top level
> which doesn't work well if the subqueries exceed the time limit.
>

Mime
View raw message