lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Supun Edirisinghe <su...@office.vtourist.com>
Subject Re: problems with lucene in multithreaded environment
Date Thu, 03 Jun 2004 19:11:14 GMT
I noticed delays when concurrent threads query an IndexSearcher too.

our index is about 550MB with about 850,000 docs. each doc with 20-30 
fields of which only 3 are indexed. Our queries are not very complex -- 
just 3 required term queries.

this is what my test did:

intialilize an array of terms that are known to appear in the
initialize a IndexSearcher
start a number of threads
	that query the indexsearcher and extract
	each thread picks random terms that are known to appear in the indexed 
Keyword fields and builds a boolean query
	and then extracts all 20-30 fields from the 1st 10 hits.
	waits .5 seconds    each thread does this 30 times.

typical queries returned 20 - 100 hits

with just one thread: 30 queries ran over a span about 20 seconds. 
search time for each query generally took 40ms to 75ms. The longest 
search time was 445ms but searches that took more than 100ms were rare.

with 5 threads: 150 queries ran over a span of 62 seconds. search time 
for each query for the most part increased to 120ms to 300ms. big 
delays were more prevalent and took 3 or 4 seconds.

with 10 or more threads things got bad. and I didn't run enough tests. 
but most searches took 1 to 2 seconds and some searches did take 20 to 
30 seconds.

when I ran the test with 5 concurrent thread each doing one query 
search times were like 100ms to 200 ms with a max of 700ms.

I have not looked into the code Lucene much and I didn't think queries 
were queued.

I ran my test with the -DdisableLuceneLocks in the command line. But I 
wasn't sure it did anything.

I ran the test on Lucene1.3 final on my powerbook G4 and tests ran with 
alot of other processes going on.

I was interested in this discussion because I could not figure out the 
delay if queries are run in parallel.


On Jun 2, 2004, at 9:32 PM, Doug Cutting wrote:

> Jayant Kumar wrote:
>> We recently tested lucene with an index size of 2 GB
>> which has about 1,500,000 documents, each document
>> having about 25 fields. The frequency of search was
>> about 20 queries per second. This resulted in an
>> average response time of about 20 seconds approx
>> per search.
>
> That sounds slow, unless your queries are very complex.  What are your 
> queries like?
>
>> What we observed was that lucene queues
>> the queries and does not release them until the
>> results are found. so the queries that have come in
>> later take up about 500 seconds. Please let us know
>> whether there is a technique to optimize lucene in
>> such circumstances.
>
> Multiple queries executed from different threads using a single 
> searcher should not queue, but should run in parallel.  A technique to 
> find out where threads are queueing is to get a thread dump and see 
> where all of the threads are stuck.  In Solaris and Linux, sending the 
> JVM a SIGQUIT will give a thread dump.  On Windows, use Control-Break.
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message