Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-dev@lucene.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.co.uk;
  h=Received:X-Yahoo-SMTP:X-YMail-OSG:X-Yahoo-Newman-Property:Message-Id:From:To:In-Reply-To:Content-Type:Mime-Version:Subject:Date:References:X-Mailer;
  b=oOi+6C/VoFVsAEP/RGpx9hEZapl2PTsn1nbI/Iyy4kAWfssfITYaPmKUHQ/3XtdEeYThiHkMLy3xk8D4wyijwkGbveCujaGD7/oFZgTqxQBxdHjbmBwzuWV2HHJ8b4iDHEGsWXz7dtiwVDp7/6VxfSHEU+pUsZcEndRYNjzl3xs=
  ;
Message-Id: <DAE495F7-5CD1-4564-8F28-8B85F26600CE@yahoo.co.uk>
From: Mark Harwood <markharw00d@yahoo.co.uk>
To: java-dev@lucene.apache.org
In-Reply-To: <85d3c3b60906231426h6eed45ecg1921152f29c90286@mail.gmail.com>
Content-Type: multipart/alternative; boundary=Apple-Mail-5--679977212
Mime-Version: 1.0 (Apple Message framework v935.3)
Subject: Re: Improving TimeLimitedCollector
Date: Sat, 27 Jun 2009 01:31:18 +0100
References: <85d3c3b60906231426h6eed45ecg1921152f29c90286@mail.gmail.com>

--Apple-Mail-5--679977212
Content-Type: text/plain;
	charset=US-ASCII;
	format=flowed;
	delsp=yes
Content-Transfer-Encoding: 7bit

Going back to my post re TimeLimitedIndexReaders - here's an  
incomplete but functional prototype:

http://www.inperspective.com/lucene/TimeLimitedIndexReader.java
http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.java


The principle is that all reader accesses check a volatile variable  
indicating something may have timed out (no need to check thread  
locals etc.) If and only if a time out has been noted threadlocals are  
checked to see which thread should throw a timeout exception.

All time-limited use of reader must be wrapped in try...finally calls  
to indicate the start and stop of a timed set of activities. A  
background thread maintains the next anticipated timeout deadline and  
simply waits until this is reached or the list of planned activities  
changes with new deadlines.


Performance seems reasonable on my Wikipedia index:

//some tests for heavy use of termenum/term docs
Read term docs for 200000 terms  in 4755 ms using no timeout limit  
(warm up)
Read term docs for 200000 terms  in 4320 ms using no timeout limit  
(warm up)
Read term docs for 200000 terms  in 4320 ms using no timeout limit
Read term docs for 200000 terms  in 4388 ms using  reader with time- 
limited access

//Example query with heavy use of termEnum/termDocs
+text:f* +text:a* +text:b* no time limit matched 1090041 docs in 2000 ms
+text:f* +text:a* +text:b* time limited collector matched 1090041 docs  
in 1963 ms
+text:f* +text:a* +text:b* time limited reader matched 1090041 docs in  
2121 ms

//Example fuzzy match burning CPU reading TermEnum
text:accomodation~0.5 no time limit matched 192084 docs in 	6428 ms
text:accomodation~0.5 time limited collector matched 192084 docs in 	 
5923 ms
text:accomodation~0.5 time limited reader matched 192084 docs in 	5945  
ms


The reader approach to limiting time is slower but has these  
advantages :

1) Multiple reader activities can be time-limited rather than just  
single searches
2) No code changes required to scorers/queries/filters etc
3) Tasks that spend plenty of  time burning CPU before collection  
happens can be killed earlier

I'm sure there's some thread safety issues to work through in my code  
and not all reader classes are wrapped (e.g. TermPositions) but the  
basics are there and seem to be functioning

Thoughts?
--Apple-Mail-5--679977212
Content-Type: text/html;
	charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

<html><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; ">Going back to my post re =
TimeLimitedIndexReaders - here's an incomplete but functional =
prototype:<div><br></div><div><a =
href=3D"http://www.inperspective.com/lucene/TimeLimitedIndexReader.java">h=
ttp://www.inperspective.com/lucene/TimeLimitedIndexReader.java</a></div><d=
iv><a =
href=3D"http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.jav=
a">http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.java</a>=
</div><div><br></div><div><br></div><div>The principle is that all =
reader accesses check a volatile variable indicating something may have =
timed out (no need to check thread locals etc.) If and only if a time =
out has been noted threadlocals are checked to see which thread should =
throw a timeout exception.</div><div><br></div><div>All time-limited use =
of reader must be wrapped in try...finally calls to indicate the start =
and stop of a timed set of activities. A background thread maintains the =
next anticipated timeout deadline and simply waits until this is reached =
or the list of planned activities changes with new =
deadlines.&nbsp;</div><div><br></div><div><br></div><div>Performance =
seems reasonable on my Wikipedia index:</div><div><br></div><div>//some =
tests for heavy use of termenum/term docs</div><div style=3D"margin-top: =
0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: =
normal normal normal 11px/normal Monaco; ">Read term docs for 200000 =
terms&nbsp; in 4755 ms using no timeout limit (warm up)</div><div =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; font: normal normal normal 11px/normal Monaco; ">Read =
term docs for 200000 terms&nbsp; in 4320 ms using no timeout limit (warm =
up)</div><div style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; font: normal normal normal =
11px/normal Monaco; ">Read term docs for 200000 terms&nbsp; in 4320 ms =
using no timeout limit</div><div style=3D"margin-top: 0px; margin-right: =
0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal =
11px/normal Monaco; ">Read term docs for 200000 terms&nbsp; in 4388 ms =
using&nbsp; reader with time-limited access</div><div style=3D"margin-top:=
 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: =
normal normal normal 11px/normal Monaco; "><br></div><div =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; font: normal normal normal 11px/normal Monaco; =
">//Example query with heavy use of termEnum/termDocs</div><div =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; font: normal normal normal 11px/normal Monaco; =
">+text:f* +text:a* +text:b* no time limit matched 1090041 docs in 2000 =
ms</div><div style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: =
0px; margin-left: 0px; font: normal normal normal 11px/normal Monaco; =
">+text:f* +text:a* +text:b* time limited collector matched 1090041 docs =
in 1963 ms</div><div style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; font: normal normal normal =
11px/normal Monaco; ">+text:f* +text:a* +text:b* time limited reader =
matched 1090041 docs in 2121 ms</div><div style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal =
normal normal 11px/normal Monaco; min-height: 15px; "><br></div><div =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; font: normal normal normal 11px/normal Monaco; =
min-height: 15px; ">//Example fuzzy match burning CPU reading =
TermEnum</div><div style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; font: normal normal normal =
11px/normal Monaco; ">text:accomodation~0.5 no time limit matched 192084 =
docs in <span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>6428 ms</div><div style=3D"margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; font: normal normal normal =
11px/normal Monaco; ">text:accomodation~0.5 time limited collector =
matched 192084 docs in <span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>5923 ms</div><div =
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; =
margin-left: 0px; font: normal normal normal 11px/normal Monaco; =
">text:accomodation~0.5 time limited reader matched 192084 docs in <span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span>5945 =
ms</div><div>&nbsp;</div><div><br></div><div>The reader approach to =
limiting time is slower but has these advantages =
:</div><div><br></div><div>1) Multiple reader activities can be =
time-limited rather than just single searches</div><div>2) No code =
changes required to scorers/queries/filters etc</div><div>3) Tasks that =
spend plenty of &nbsp;time burning CPU before collection happens can be =
killed earlier</div><div><br></div><div>I'm sure there's some thread =
safety issues to work through in my code and not all reader classes are =
wrapped (e.g. TermPositions) but the basics are there and seem to be =
functioning</div><div><br></div><div>Thoughts?</div></body></html>=

--Apple-Mail-5--679977212--