lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: Improving TimeLimitedCollector
Date Sat, 27 Jun 2009 16:47:07 GMT
Why don't you use Thread.interrupt(), .isInterrupted() ?

On Sat, Jun 27, 2009 at 16:16, Shai Erera<serera@gmail.com> wrote:
>> A downside of breaking it out into static methods like this is that a
>> thread cannot run >1 time-limited activity simultaneously but I guess that
>> might be a reasonable restriction.
>
> I'm not sure I understand that - how can a thread run >1 activity
> simultaneously anyway, and how's your impl in TimeLimitingIndexReader allows
> it to do so? You use the thread as a key to the map. Am I missing something?
>
> Anyway, I think we can let go of the static methods and make them instance
> methods. I think .. if I want to use time limited activities, I should
> create a TimeLimitedThreadActivity instance and pass it around, to
> TimeLimitingIndexReader (and maybe in the future to a similar **IndexWriter)
> and any other custom code I have which I want to put a time limit on.
>
> A static class has the advantage of not needing to pass it around
> everywhere, and is accessible from everywhere, so that if we discover that
> limiting on IndexReader is not enough, and we want some of the scorers to
> check more frequently if they should stop, we won't need to pass that
> instance all the way down to them.
>
> I don't mind keeping it static, but I also don't mind if it will be an
> instance passed around, since currently it's only passed to IndexReader.
>
> Are you going to open an issue for that? Seems like a nice addition to me.
> Do you think it should belong in core or contrib? If 'core' then if that's
> possible I'd like to see all timeout classes under one package, including
> TimeLimitingCollector (which until 2.9 we can safely move around as we
> want).
> I don't mind working on that w/ you, if you want.
>
> Shai
>
> On Sat, Jun 27, 2009 at 2:24 PM, Mark Harwood <markharw00d@yahoo.co.uk>
> wrote:
>>
>> Thanks for the feedback, Shai.
>> So I guess you're suggesting breaking this out into a general utility
>> class e.g. something like:
>> class TimeLimitedThreadActivity
>> {
>>         //called by client
>>         public static void startTimeLimitedActivity(long
>> maxTimePermitted).
>>         public static void endTimeLimitedActivity()
>>        //called by resources (reader/writers) that need to be shared
>> fairly by threads
>>       public static void checkActivityNotElapsed(); //throws some form of
>> runtime exception
>> }
>> A downside of breaking it out into static methods like this is that a
>> thread cannot run >1 time-limited activity simultaneously but I guess that
>> might be a reasonable restriction.
>>
>> >>Aside, how about using a PQ for the threads' times, or a TreeMap. That
>> >> will save looping over the collection to find the next candidate. Just an
>> >> implementation detail though.
>> Yep, that was one of the rough edges - I just wanted to get raw timings
>> first for the all the "is timed out?" checks we're injecting into reader
>> calls.
>> Cheers
>> Mark
>>
>> On 27 Jun 2009, at 11:37, Shai Erera wrote:
>>
>> I like the overall approach. However it's very local to an IndexReader.
>> I.e., if someone wanted to limit other operations (say indexing), or does
>> not use an IndexReader (for a Scorer impl maybe), one cannot reuse it.
>>
>> What if we factor out the timeout logic to a Timeout class (I think it can
>> be a static class, with the way you implemented it) and use it in
>> TimeLimitingIndexReader? That class can offer a method check() which will do
>> the internal logic (the 'if' check and throw exception). It will be similar
>> to the current ensureOpen() followed by an operation.
>>
>> It might be considered more expensive since it won't check a boolean, but
>> instead call a check() method, but it will be more reusable. Also,
>> ensureOpen today is also a method call, so I don't think Timeout.check() is
>> that bad. We can even later create a TimeLimitingIndexWriter and document
>> Timeout class for other usage by external code.
>>
>> Aside, how about using a PQ for the threads' times, or a TreeMap. That
>> will save looping over the collection to find the next candidate. Just an
>> implementation detail though.
>>
>> Shai
>>
>> On Sat, Jun 27, 2009 at 3:31 AM, Mark Harwood <markharw00d@yahoo.co.uk>
>> wrote:
>>>
>>> Going back to my post re TimeLimitedIndexReaders - here's an incomplete
>>> but functional prototype:
>>> http://www.inperspective.com/lucene/TimeLimitedIndexReader.java
>>> http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.java
>>>
>>> The principle is that all reader accesses check a volatile variable
>>> indicating something may have timed out (no need to check thread locals
>>> etc.) If and only if a time out has been noted threadlocals are checked to
>>> see which thread should throw a timeout exception.
>>> All time-limited use of reader must be wrapped in try...finally calls to
>>> indicate the start and stop of a timed set of activities. A background
>>> thread maintains the next anticipated timeout deadline and simply waits
>>> until this is reached or the list of planned activities changes with new
>>> deadlines.
>>>
>>> Performance seems reasonable on my Wikipedia index:
>>> //some tests for heavy use of termenum/term docs
>>> Read term docs for 200000 terms  in 4755 ms using no timeout limit (warm
>>> up)
>>> Read term docs for 200000 terms  in 4320 ms using no timeout limit (warm
>>> up)
>>> Read term docs for 200000 terms  in 4320 ms using no timeout limit
>>> Read term docs for 200000 terms  in 4388 ms using  reader with
>>> time-limited access
>>> //Example query with heavy use of termEnum/termDocs
>>> +text:f* +text:a* +text:b* no time limit matched 1090041 docs in 2000 ms
>>> +text:f* +text:a* +text:b* time limited collector matched 1090041 docs in
>>> 1963 ms
>>> +text:f* +text:a* +text:b* time limited reader matched 1090041 docs in
>>> 2121 ms
>>> //Example fuzzy match burning CPU reading TermEnum
>>> text:accomodation~0.5 no time limit matched 192084 docs in 6428 ms
>>> text:accomodation~0.5 time limited collector matched 192084 docs in 5923
>>> ms
>>> text:accomodation~0.5 time limited reader matched 192084 docs in 5945 ms
>>>
>>> The reader approach to limiting time is slower but has these advantages :
>>> 1) Multiple reader activities can be time-limited rather than just single
>>> searches
>>> 2) No code changes required to scorers/queries/filters etc
>>> 3) Tasks that spend plenty of  time burning CPU before collection happens
>>> can be killed earlier
>>> I'm sure there's some thread safety issues to work through in my code and
>>> not all reader classes are wrapped (e.g. TermPositions) but the basics are
>>> there and seem to be functioning
>>> Thoughts?
>>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message