lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <eks...@yahoo.co.uk>
Subject Re: Improving TimeLimitedCollector
Date Wed, 24 Jun 2009 09:13:05 GMT
Re: "I think such a parameter should not exist on individual search methods
since it's more of a global setting (i.e., I want my searches to be
limited to 5 seconds, always, not just for a particular query). Right?"

I am not sure about this one, we had cases where one phisical index served two logical indices
with different requirements for clients. having Timeout settable per Query is nice to have.


At the end of day, with such timeout you support Quality/Time compromise settings:
"if you need all results, be ready to wait longer and set longer timeout"
"if you need SOME results quickly than reduce this timeout"

that should be idealy user decision 




________________________________
From: Shai Erera <serera@gmail.com>
To: java-dev@lucene.apache.org
Sent: Wednesday, 24 June, 2009 10:55:50
Subject: Re: Improving TimeLimitedCollector


But TimeLimitingCollector's logic is coded in its collect() method. The top scorer calls nextDoc()
or advance() on all its sub-scorers, and only when a match is found it calls collect().

If we want the sub-scorers to check whether they should abort, we'd need to revamp (liked
the word :)) TimeLimitingCollector, to be something like CheckAbort SegmentMerger uses. I.e.,
the top scorer will pass such an instance to its sub scorers, which will call a TimeLimit.check()
or something and if the time limit has expired this call will throw a TimeExceededException
(like TLC).

We can enable this by adding another parameter to IndexSearcher whether searches should be
limited by time, and what's the time limit. It will then instantiate that object and pass
it to its Scorer and so on. I think such a parameter should not exist on individual search
methods since it's more of a global setting (i.e., I want my searches to be limited to 5 seconds,
always, not just for a particular query). Right?

Another option would be to add a setTimeout method on Query, which will use it when it constructs
its Scorer. The shortcoming of this is that if I want to use someone else's query which did
not implement setTimeout, then I'll need to build a TimeOutQueryWrapper that will wrap a Query,
and implement the timeout logic, but that's get complicated.

I think the Collector approach makes the most sense to me, since it's the only object I fully
control in the search process. I cannot control Query implementations, and I cannot control
the decisions made by IndexSearcher. But I can always wrap someone else's Collector with TLC
and pass it to search().

Shai


On Wed, Jun 24, 2009 at 12:26 AM, Jason Rutherglen <jason.rutherglen@gmail.com> wrote:

As we're revamping collectors, weights, and scorers, perhaps we
can push time limiting into the individual subscorers? Currently
on a boolean query, we're timing out the query at the top level
which doesn't work well if the subqueries exceed the time limit.


      
Mime
View raw message