lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: large index scalability and java 1.1 compatibility question
Date Tue, 20 Jan 2004 18:01:37 GMT
Mike Sawka wrote:
> We are currently running some multi-gigabyte indexes with over 10
> million documents, and the "optimize" time is starting to become a
> problem.  For our largest indexes we're already seeing times of 10-20
> minutes, on a fairly decent machine, which is starting to hit the
> threshold of acceptability for us (and will become unbearable as the
> index grows 2-10 times larger).  So I've got two questions:
>    * Are there any tricks that you guys use to run large (incrementally
> updatable) indexes?  I've already setup a mirroring system so I have one
> index that is always searchable while the other one is incrementally
> updating (and they swap periodically).

A faster i/o system (e.g., a RAID with striping), faster CPU, faster JVM 
and more RAM will all help, but you probably already knew that.

Note also that as indexes approach 100M documents, query performance may 
also begin to slow unacceptably.  For example, in one application, I 
found that a 30M document index could only withstand a few queries per 
second.  If actual traffic is greater than that, then you can distribute 
the index to multiple machines and search all sub-indexes in parallel. 
For example, you might have five machines each searching a 6M document 
index instead of a single 30M document index.

This sort of approach also makes it easier to maintain the indexes, as 
no single index is that large.  A ParallelMultiSearcher patch was 
recently submitted which makes it easy to build such a system using RMI. 
  Could something like this work for you?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message