lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class
Date Thu, 11 Feb 2010 12:18:27 GMT


Shai Erera commented on LUCENE-1720:

I like the idea of adding the projected activity timeout in general, but I'd like to question
its usefulness in reality (or at least for search applications). The way I think of it (and
it might be because I'm thinking of my use case) there are two problems with such API:
# It might not be very easy (if at all) or performing to project how much of the work has
been done. For TermQuery it might be easy to tell this (e.g. numSeenSoFar / df(term)), but
that will add an 'if' to every document that is traversed, and possible more operations. But
for more complicated queries, I'm not sure you'll be able to tell how much of the query has
been processed.
# If I am willing to sustain a 10s query, then I guess I'd want to extract as much information
as I can in those 10s. If after 1s I realize I haven't processed even 10% of the data that
doesn't mean I'd like to stop, right? Maybe the query/activity will speed up shortly? I think
that if I put a cap on the query time, it means I don't mind spending that amount of time
... but I also recognize this may depend on the application, and therefore that is not a too
strong argument.

I think this approach is interesting, as it is able to detect 'hanging' threads (such as those
stuck in infinite loops).

I realize however that ActivityTimeMonitor is not search specific (which makes me think it
should be moved to o.a.l.util or something) and therefore the projected activity timeout can
have its usage in other places.

How about if we do it in a separate issue? We still need to write enough tests for what exists
so far, and turn the Benchmark class into a benchmark task/alg. I think that if we can avoid
extra functionality (which is likely to add more bugs to cover) it will be easier to finish
that issue, no?
BTW, in order to support this we'll need to store the startTime as well, not just the timeoutTime,
which means that we either add another startTimesThreads map, or change the map to be from
Thread to a Times object which encapsulates both times ... Minor thing though.

Also, is this targeted to be added to 'core' or contrib?

> TimeLimitedIndexReader and associated utility class
> ---------------------------------------------------
>                 Key: LUCENE-1720
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>         Attachments:,,,, LUCENE-1720.patch,,,,
> An alternative to TimeLimitedCollector that has the following advantages:
> 1) Any reader activity can be time-limited rather than just single searches e.g. the
document retrieve phase.
> 2) Times out faster (i.e. runaway queries such as fuzzies detected quickly before last
"collect" stage of query processing)
> Uses new utility timeout class that is independent of IndexReader.
> Initial contribution includes a performance test class but not had time as yet to work
up a formal Junit test.
> TimeLimitedIndexReader is coded as JDK1.5 but can easily be undone.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message