lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <>
Subject Re: Improving TimeLimitedCollector
Date Thu, 25 Jun 2009 12:59:50 GMT
When this thread was focused on Scorers only, I wanted to offer the

* Have a TimeLimitingQuery which will wrap another Query, and delegate all
the calls to it, except queryWeight, which will create TimeLimitingWeight.
* Have a TimeLimitingWeight which will wrap the original query's
QueryWeight, and impl its scorer(...) as: (1) create a Timeout object and
(2) call origQueryWeight.scorer(..., timeout).
* Add to QueryWeight.scorer() another parameter Timeout.

IndexSearcher will pass a null Timeout (or a NopTimeout) and if
TimeLimitingQuery is used by the user, it will create a true Timeout object.

Then, a Scorer impl can choose between ignoring the Timeout checks (for
which case we'll still have TimeLimitingCollector as a fallback), or call
periodically (every nextDoc(), advance()), timeout.check(), which will throw
TimeExceededException if the timeout has reached.

I then thought that if timeout is not required by an application, it will
pay the extrac checks for timeout != null, or in case of NopTimeout -
calling its empty check(), unnecessarily. Today, w/ TimeLimitingCollector,
if I don't instantiate it, I don't suffer from any extra overhead.

Mark, TimeLimitingIndexReader will basically have the same problem, only for
a wider range of operations. I.e., if I'm not interested in timeouts, any
operation that will be invoked will check for timeout. Are we ok with that
(e.g. adding null check or NopTimeout checks)?

If we are, then how about introducing a static Timeout class, with a
ThreadLocal member. Timeout.check() will invoke the ThreadLocal Timeout
member and check for timeout. The user will do
Timeout.start(MAX_ALLOWED_DURATION), which will set the ThreadLocal Timeout
and then any checks done by this thread will be timeout-able? At the end of
the operation, the user will need to do Timeout.cancel() or something like

With this approach, we can discard TimeLimitingIndexReader, which we won't
need to create versions for Multi, Parallel etc., and use it wherever we
want. Or ... we can keep TimeLimitingIndexReader, and have it accept another
IndexReader, and override all the reasonable methods from IndexReader, doing
something like:
try {
} finally {

That will be more convenient for the application I think.


On Thu, Jun 25, 2009 at 1:35 AM, Jason Rutherglen <> wrote:

> This would be good however how would we obtain the thread? I
> believe this would require using a ThreadLocalish type of system
> which could be quite slow (to obtain the thread and lookup in
> the hashmap).
> One implementation I looked at before was to add this check in
> IndexReader.isDeleted (by overriding inside FilterIndexReader).
> Perhaps SegmentTermDocs is the best place for this, it could
> check a timeout boolean variable which can (somehow) be
> optional, meaning, the code doesn't always check this and may
> also use a different call path. The boolean variable wouldn't be
> volatile as I believe based on our previous discussions is
> slower to check than a regular variable. Hopefully this would be
> good enough and more accurate than our current implementation.
> On Wed, Jun 24, 2009 at 3:07 PM, Mark Harwood <>wrote:
>>  I think the Collector approach makes the most sense to me, since it's the
>>> only object I fully control in the search process. I cannot control Query
>>> implementations, and I cannot control the decisions made by IndexSearcher.
>>> But I can always wrap someone else's Collector with TLC and pass it to
>>> search().
>> Isn't another option to have a TimeLimitedIndexReader?
>> Clients would need to make a call before the start of any timeout-able
>> activity e.g.
>>    timeLimitedReader.startActivity(MAX_ALLOWED_DURATION);  //sets a
>> ThreadLocal
>>   //run query etc
>> All invocations on special IndexReader, TermEnum, TermDocs etc subclasses
>> can then periodically check to see if the calling thread has exceeded its
>> allotted time. This would be a potentially non-invasive way of making all
>> filters', queries', scorers' etc read activity subject to some time limit as
>> they all typically have to invoke IndexReader-related classes relatively
>> frequently as part of their processing.
>> Not sure how quick you can make all the underlying time checks though.
>> Cheers,
>> Mark
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

View raw message