lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: search timeout
Date Sat, 17 Mar 2007 21:42:45 GMT
markharw00d wrote:
> Chris Hostetter wrote:
>> this is something anyone using the Lucene API can do as long as they 
>> use a
>> HitCollector ... the Nutch impl seems to ctually spin up a seperate 
>> thread
> I'm keen to understand the pros and cons of these two approaches.
> With the HitCollector approach is this just engineering a fall at the 
> final hurdle? It could be that long running queries spend all their 
> time doing edit-distance comparisions for a a fuzzy boolean query, 
> say  or reading TermDocs for a large range filter to create a BitSet 
> only to be aborted at the collection stage?
> Another point - I noticed in some basic timing tests that calling 
> System.currentTimeMillis() in a tight loop like for *every* call to 
> HitCollector.collect(..) could add reasonable overhead so you probably 
> only want to call this for every nth document collected when testing 
> execution times.

That's why Nutch implementation doesn't do this (I know, I wrote it ;) ).

What it does is the following (please see the patch for details):

* it creates a single (static) timer thread, which counts the "ticks", 
every couple hundred ms (configurable). It uses a volatile int counter, 
therefore avoiding the need to synchronize.

* each HitColector records the start tick count in its constructor, and 
then checks the current tick count in collect(...). If the difference is 
too large then it throws a RuntimeException (NOTE: would someone 
*please* refactor this API so that we can exit this loop more gracefully!).

This design has several benefits: it avoids creating too many timer 
threads (there is just one per JVM), it avoids the need to synchronize 
on the value being changed, and it avoids calling 

Best regards,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message