Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 96691 invoked from network); 27 Jun 2009 00:31:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Jun 2009 00:31:45 -0000 Received: (qmail 97284 invoked by uid 500); 27 Jun 2009 00:31:55 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 97217 invoked by uid 500); 27 Jun 2009 00:31:55 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 97209 invoked by uid 99); 27 Jun 2009 00:31:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Jun 2009 00:31:55 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [77.238.184.49] (HELO smtp111.mail.ukl.yahoo.com) (77.238.184.49) by apache.org (qpsmtpd/0.29) with SMTP; Sat, 27 Jun 2009 00:31:45 +0000 Received: (qmail 98761 invoked from network); 27 Jun 2009 00:31:22 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.uk; h=Received:X-Yahoo-SMTP:X-YMail-OSG:X-Yahoo-Newman-Property:Message-Id:From:To:In-Reply-To:Content-Type:Mime-Version:Subject:Date:References:X-Mailer; b=oOi+6C/VoFVsAEP/RGpx9hEZapl2PTsn1nbI/Iyy4kAWfssfITYaPmKUHQ/3XtdEeYThiHkMLy3xk8D4wyijwkGbveCujaGD7/oFZgTqxQBxdHjbmBwzuWV2HHJ8b4iDHEGsWXz7dtiwVDp7/6VxfSHEU+pUsZcEndRYNjzl3xs= ; Received: from unknown (HELO ?192.168.2.50?) (markharw00d@194.106.34.5 with plain) by smtp111.mail.ukl.yahoo.com with SMTP; 27 Jun 2009 00:31:22 -0000 X-Yahoo-SMTP: zTNABayswBADoI1EHkfx1KN3qDu5p6Ws X-YMail-OSG: bYSz6TkVM1nUbFSqTv5zy_msldG73hKhPl5GtLhw6nGorfbUa_pGxCGqLzJjye.F.LNUTIUNGMTB0bFpcUIDQidTxJA6DyuKRH6BJGEDRYLWHlcMpwpmj1kD8U6EmDKh0Z9EfzUz330kUHPGK0_CnO0SQ8ifHYAmAlSCZl4feyvQwoW7Vfs20e7cwxGR1pYuoCHkSJjsKYd50nhfI9q5QHZt3sfdWoz473Lzy0b0lY2yPRFELlO1e10FZgRYO0ngkLu7sTPrHYnZnusR2IOTqiJ2yICRUyDf0FJBjZ_cwug7AvNkFE.JQcr9dPdwUHMGGwhDNlJ6h1y8705eqlYS2XychMB_g7i2txeEBZjJtXIO2KzRDRqsIVp.yKaRM8C5WXLAz4tqw99eeORSX4UXNYqv8Fo8k9bwdAgF5_6Vx6Ky X-Yahoo-Newman-Property: ymail-3 Message-Id: From: Mark Harwood To: java-dev@lucene.apache.org In-Reply-To: <85d3c3b60906231426h6eed45ecg1921152f29c90286@mail.gmail.com> Content-Type: multipart/alternative; boundary=Apple-Mail-5--679977212 Mime-Version: 1.0 (Apple Message framework v935.3) Subject: Re: Improving TimeLimitedCollector Date: Sat, 27 Jun 2009 01:31:18 +0100 References: <85d3c3b60906231426h6eed45ecg1921152f29c90286@mail.gmail.com> X-Mailer: Apple Mail (2.935.3) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-5--679977212 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Going back to my post re TimeLimitedIndexReaders - here's an incomplete but functional prototype: http://www.inperspective.com/lucene/TimeLimitedIndexReader.java http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.java The principle is that all reader accesses check a volatile variable indicating something may have timed out (no need to check thread locals etc.) If and only if a time out has been noted threadlocals are checked to see which thread should throw a timeout exception. All time-limited use of reader must be wrapped in try...finally calls to indicate the start and stop of a timed set of activities. A background thread maintains the next anticipated timeout deadline and simply waits until this is reached or the list of planned activities changes with new deadlines. Performance seems reasonable on my Wikipedia index: //some tests for heavy use of termenum/term docs Read term docs for 200000 terms in 4755 ms using no timeout limit (warm up) Read term docs for 200000 terms in 4320 ms using no timeout limit (warm up) Read term docs for 200000 terms in 4320 ms using no timeout limit Read term docs for 200000 terms in 4388 ms using reader with time- limited access //Example query with heavy use of termEnum/termDocs +text:f* +text:a* +text:b* no time limit matched 1090041 docs in 2000 ms +text:f* +text:a* +text:b* time limited collector matched 1090041 docs in 1963 ms +text:f* +text:a* +text:b* time limited reader matched 1090041 docs in 2121 ms //Example fuzzy match burning CPU reading TermEnum text:accomodation~0.5 no time limit matched 192084 docs in 6428 ms text:accomodation~0.5 time limited collector matched 192084 docs in 5923 ms text:accomodation~0.5 time limited reader matched 192084 docs in 5945 ms The reader approach to limiting time is slower but has these advantages : 1) Multiple reader activities can be time-limited rather than just single searches 2) No code changes required to scorers/queries/filters etc 3) Tasks that spend plenty of time burning CPU before collection happens can be killed earlier I'm sure there's some thread safety issues to work through in my code and not all reader classes are wrapped (e.g. TermPositions) but the basics are there and seem to be functioning Thoughts? --Apple-Mail-5--679977212 Content-Type: text/html; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Going back to my post re = TimeLimitedIndexReaders - here's an incomplete but functional = prototype:

http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.java=


The principle is that all = reader accesses check a volatile variable indicating something may have = timed out (no need to check thread locals etc.) If and only if a time = out has been noted threadlocals are checked to see which thread should = throw a timeout exception.

All time-limited use = of reader must be wrapped in try...finally calls to indicate the start = and stop of a timed set of activities. A background thread maintains the = next anticipated timeout deadline and simply waits until this is reached = or the list of planned activities changes with new = deadlines. 


Performance = seems reasonable on my Wikipedia index:

//some = tests for heavy use of termenum/term docs
Read term docs for 200000 = terms  in 4755 ms using no timeout limit (warm up)
Read = term docs for 200000 terms  in 4320 ms using no timeout limit (warm = up)
Read term docs for 200000 terms  in 4320 ms = using no timeout limit
Read term docs for 200000 terms  in 4388 ms = using  reader with time-limited access

+text:f* +text:a* +text:b* time limited reader = matched 1090041 docs in 2121 ms

//Example fuzzy match burning CPU reading = TermEnum
text:accomodation~0.5 no time limit matched 192084 = docs in = 6428 ms
text:accomodation~0.5 time limited collector = matched 192084 docs in 5923 ms
5945 = ms
 

The reader approach to = limiting time is slower but has these advantages = :

1) Multiple reader activities can be = time-limited rather than just single searches
2) No code = changes required to scorers/queries/filters etc
3) Tasks that = spend plenty of  time burning CPU before collection happens can be = killed earlier

I'm sure there's some thread = safety issues to work through in my code and not all reader classes are = wrapped (e.g. TermPositions) but the basics are there and seem to be = functioning

Thoughts?
= --Apple-Mail-5--679977212--