lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Itamar Syn-Hershko <ita...@code972.com>
Subject Index size and performance degradation
Date Sat, 11 Jun 2011 19:37:51 GMT
Hi all,

I know Lucene indexes to be at their optimum up to a certain size - said 
to be around several GBs. I haven't found a good discussion over this, 
but its my understanding that at some point its better to split an index 
into parts (a la sharding) than to continue searching on a huge-size 
index. I assume this has to do with OS and IO configurations. Can anyone 
point me to more info on this?

We have a product that is using Lucene for various searches, and at the 
moment each type of search is using its own Lucene index. We plan on 
refactoring the way it works and to combine all indexes into one - 
making the whole system more robust and with a smaller memory footprint, 
among other things.

Assuming the above is true, we are interested in knowing how to do this 
correctly. Initially all our indexes will be run in one big index, but 
if at some index size there is a severe performance degradation we would 
like to handle that correctly by starting a new FSDirectory index to 
flush into, or by re-indexing and moving large indexes into their own 
Lucene index.

Are there are any guidelines for measuring or estimating this correctly? 
what we should be aware of while considering all that? We can't assume 
anything about the machine running it, so testing won't really tell us 
much...

Thanks in advance for any input on this,

Itamar.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message