lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Harwood <markharw...@yahoo.co.uk>
Subject Re: Improving TimeLimitedCollector
Date Sat, 27 Jun 2009 11:24:15 GMT
Thanks for the feedback, Shai.

So I guess you're suggesting breaking this out into a general utility  
class e.g. something like:

class TimeLimitedThreadActivity
{
         //called by client
         public static void startTimeLimitedActivity(long  
maxTimePermitted).
         public static void endTimeLimitedActivity()

        //called by resources (reader/writers) that need to be shared  
fairly by threads
       public static void checkActivityNotElapsed(); //throws some  
form of runtime exception
}

A downside of breaking it out into static methods like this is that a  
thread cannot run >1 time-limited activity simultaneously but I guess  
that might be a reasonable restriction.


 >>Aside, how about using a PQ for the threads' times, or a TreeMap.  
That will save looping over the collection to find the next candidate.  
Just an implementation detail though.

Yep, that was one of the rough edges - I just wanted to get raw  
timings first for the all the "is timed out?" checks we're injecting  
into reader calls.

Cheers
Mark


On 27 Jun 2009, at 11:37, Shai Erera wrote:

> I like the overall approach. However it's very local to an  
> IndexReader. I.e., if someone wanted to limit other operations (say  
> indexing), or does not use an IndexReader (for a Scorer impl maybe),  
> one cannot reuse it.
>
> What if we factor out the timeout logic to a Timeout class (I think  
> it can be a static class, with the way you implemented it) and use  
> it in TimeLimitingIndexReader? That class can offer a method check()  
> which will do the internal logic (the 'if' check and throw  
> exception). It will be similar to the current ensureOpen() followed  
> by an operation.
>
> It might be considered more expensive since it won't check a  
> boolean, but instead call a check() method, but it will be more  
> reusable. Also, ensureOpen today is also a method call, so I don't  
> think Timeout.check() is that bad. We can even later create a  
> TimeLimitingIndexWriter and document Timeout class for other usage  
> by external code.
>
> Aside, how about using a PQ for the threads' times, or a TreeMap.  
> That will save looping over the collection to find the next  
> candidate. Just an implementation detail though.
>
> Shai
>
> On Sat, Jun 27, 2009 at 3:31 AM, Mark Harwood  
> <markharw00d@yahoo.co.uk> wrote:
> Going back to my post re TimeLimitedIndexReaders - here's an  
> incomplete but functional prototype:
>
> http://www.inperspective.com/lucene/TimeLimitedIndexReader.java
> http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.java
>
>
> The principle is that all reader accesses check a volatile variable  
> indicating something may have timed out (no need to check thread  
> locals etc.) If and only if a time out has been noted threadlocals  
> are checked to see which thread should throw a timeout exception.
>
> All time-limited use of reader must be wrapped in try...finally  
> calls to indicate the start and stop of a timed set of activities. A  
> background thread maintains the next anticipated timeout deadline  
> and simply waits until this is reached or the list of planned  
> activities changes with new deadlines.
>
>
> Performance seems reasonable on my Wikipedia index:
>
> //some tests for heavy use of termenum/term docs
> Read term docs for 200000 terms  in 4755 ms using no timeout limit  
> (warm up)
> Read term docs for 200000 terms  in 4320 ms using no timeout limit  
> (warm up)
> Read term docs for 200000 terms  in 4320 ms using no timeout limit
> Read term docs for 200000 terms  in 4388 ms using  reader with time- 
> limited access
>
> //Example query with heavy use of termEnum/termDocs
> +text:f* +text:a* +text:b* no time limit matched 1090041 docs in  
> 2000 ms
> +text:f* +text:a* +text:b* time limited collector matched 1090041  
> docs in 1963 ms
> +text:f* +text:a* +text:b* time limited reader matched 1090041 docs  
> in 2121 ms
>
> //Example fuzzy match burning CPU reading TermEnum
> text:accomodation~0.5 no time limit matched 192084 docs in 	6428 ms
> text:accomodation~0.5 time limited collector matched 192084 docs in 	 
> 5923 ms
> text:accomodation~0.5 time limited reader matched 192084 docs in 	 
> 5945 ms
>
>
> The reader approach to limiting time is slower but has these  
> advantages :
>
> 1) Multiple reader activities can be time-limited rather than just  
> single searches
> 2) No code changes required to scorers/queries/filters etc
> 3) Tasks that spend plenty of  time burning CPU before collection  
> happens can be killed earlier
>
> I'm sure there's some thread safety issues to work through in my  
> code and not all reader classes are wrapped (e.g. TermPositions) but  
> the basics are there and seem to be functioning
>
> Thoughts?
>


Mime
View raw message