lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject RE: Performance searching over multiple indexes
Date Thu, 25 Oct 2007 17:02:56 GMT
Hello,

> Using more than one Index will definitely decrease the 
> searching performance. The most Lucene search latency is to 
> load the hits. If there is no hit, the searching takes a 
> short time, dozens milli seconds and it's a const if the 
> document number is less than 1M. search 100 indexes will take 
> 100 times longer.

I was kind of experiencing this indeed :-) 

> 
> I think it's not a good way to use many indexes when the 
> document number is small. Also you can try 
> ParallelMultiSearcher which search all indexes in parallel.

Yes, I have tried it already, but for few documents (say 100.000) in 100
indexes, ParallelMultiSearcher is actually even slower (i think because
of locks. Probably only makes sense if you have a couple of very large
indexes)

> 
> I suppose you did not take the opening index time in the comparison.

No. The original reasoning behind the "many indexes" was to have fast
incremental updating, have one index in memory, and flush to persistent
indexes every x sec. Then, these persistent indexes are being merged
likewise the segments in lucene. I think the idea is ok, but the number
of persistent indexes must be kept small I think. I'll do some more
testing,

thx for your advice,

regards Ard


> 
> 
> -----Original Message-----
> From: Ard Schrijvers [mailto:a.schrijvers@hippo.nl]
> Sent: Thursday, October 25, 2007 6:09 PM
> To: java-user@lucene.apache.org
> Subject: Performance searching over multiple indexes
> 
> Hello, 
> 
> I am experimenting with lucene MultiSearcher and do some 
> simple BooleanQueries in which I combine a couple of 
> TermQueries. I am experiencing, that a single lucene index 
> for just 100.000 docs (~10 k
> each) is like 100 times faster than when I have about 100 
> seperate indexes and use MultiSearcher. The difference 
> specifically is visible when the number of hits gets lower 
> (ie, more TermQueries). A single index seems to be way 
> faster. I must admit I did optimize the single index (but I 
> can't imagine this explains the 100X). 
> 
> Is it correct that a single index is much faster when the 
> query consists of many TermQueries where the number of hits 
> is low? Does lucene something like starting with the Term 
> that has the lowest number of hits, and then do the 
> consecutive terms with the lowest hits? Is this more 
> efficient within one index, or is it the combining of the 
> hits that makes it slower? 
> 
> Hopefully somebody can enlight me,
> 
> thx
> 
> Regards Ard
> 
> -- 
> 
> Hippo
> Oosteinde 11
> 1017WT Amsterdam
> The Netherlands
> Tel  +31 (0)20 5224466
> -------------------------------------------------------------
> a.schrijvers@hippo.nl / ard@apache.org / http://www.hippo.nl
> -------------------------------------------------------------- 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message