lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexSearcher and multi-threaded performance
Date Tue, 11 Nov 2008 22:40:51 GMT

Nice results, thanks!

The poor disk-based scaling may be fixed by NIOFSDirectory, if you are  
on Unix.  If you are on Windows it won't help (and will likely be  
worse than FSDirectory), because of an apparently bug in Sun's JVM on  
Windows whereby NIO positional file reads seem to share a lock under  
the hood.

The poor ram-thread result  for 4 & 6 threads is odd.  Those numbers  
ought to be at least as good as ram-shared.  Is it possible those  
columns are swapped?  Because the ram-shared case should have been  
hurt by using a non-read-only IndexReader.

Mike

Dmitri Bichko wrote:

> Hi,
>
> I'm pretty new to Lucene, so please bear with me if this has been
> covered before.
>
> The wiki suggests sharing a single IndexSearcher between threads for
> best performance
> (http://wiki.apache.org/lucene-java/ImproveSearchingSpeed).  I've
> tested running the same set of queries with: multiple threads sharing
> the same searcher, with a separate searcher for each thread, both
> shared/private with a RAMDirectory in-memory index, and (just for fun)
> in multiple JVMs running concurrently (the results are in milliseconds
> to complete the whole job):
>
> threads  multi-jvm  shared  per-thread  ram-shared  ram-thread
>      1      72997   70883       72573       60308       60012
>      2      33147   48762       35973       25498       25734
>      4      16229   46828       21267       13127       27164
>      6      13088   47240       14028        9858       29917
>      8       9775   47020       10983        8948       10440
>     10       8721   50132       11334        9587       11355
>     12       7290   49002       11798        9832
>     16       9365   47099       12338       11296
>
> The shared searcher indeed behaves better with a ram-based index, but
> what's going on with the disk-based one?  It's basically not scaling
> beyond two threads. Am I just doing something completely wrong here?
>
> The test consists of about 1,500 Boolean OR queries with 1-10
> PhraseQueries each, with 1-20 Terms per PhraseQuery.  I'm using a
> HitCollector to count the hits, so I'm not retrieving any results.
> The index is about 5GB and 20 million documents.
>
> This is running on a 8 x quad-core Opteron machine with plenty of  
> RAM to spare.
>
> Any idea why I would see this behaviour?
>
> Thanks,
> Dmitri
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message