lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <>
Subject Re: TermInfosReader lazy term index reading
Date Fri, 02 Feb 2007 22:56:14 GMT
I think that is much more involved... I don't think there is an easy  
way to move a query between threads/pools once it has started unless  
you restart the entire query.

You might be able to dynamically lower the thread priority however  
when you detect a long query, so that smaller (faster) queries would  
have priority.

On Feb 2, 2007, at 4:44 PM, Doron Cohen wrote:

> robert engels <> wrote on 02/02/2007 14:08:46:
>> You might be able to quantify the search request ahead of time (# of
>> terms, # of high frequency terms, etc.) and assign the request to the
>> appropriate pool (quick, normal, lengthy).
>> Then you can assign an appropriate # of threads to each pool.
> Or, to avoid pre-computation, requests can first be assigned to a
> 'faster' queue, assuming they are short, and only later, if a
> request turns out to be longer, it can me dynamically moved to a
> 'slower' queue, maybe less prioritized. (Similar I think to OS
> job scheduling.) (Can have more than 2 queues.)
> I wonder if there's danger that queueing queries would increase the
> avg time-to-complete, even if the total time is reduced?
>> Most people understand that complex queries might take longer to
>> execute.
>> On Feb 2, 2007, at 4:01 PM, Yonik Seeley wrote:
>>> On 2/2/07, robert engels <> wrote:
>>>> For a process that is mostly CPU bound (which is the case with  
>>>> Lucene
>>>> if the index is in the OS cache), having so many "active" threads
>>>> will actually hurt performance due to the context switching and
>>>> synchronization.
>>> Sure... it certainly wasn't by design to have that many threads all
>>> trying to do something.
>>>> Better to use a request queue / thread pool. (I
>>>> think I read somewhere that a good rule of thumb is 2x the  
>>>> number of
>>>> processors).
>>> You might hit a scenario where a couple of threads are doing long
>>> running queries, and that could lock out other queries that might
>>> otherwise execute quickly.  But overall, it's not a bad idea.
>>>> If most of the searches are IO bound having so many disparate
>>>> requests will hurt performance as well since the disk heads will be
>>>> seeking all over the place and losing any locality of data that
>>>> Lucene provides (postings, sequental term reads, etc.).
>>> We're not hitting disk... plenty of RAM.
>>> -Yonik
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message