lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vince Taluskie <>
Subject Re: Query performance on a 315 Million document index (1TB)
Date Fri, 07 May 2004 09:54:57 GMT
On Fri, 7 May 2004, Will Allen wrote:

> Hi,
> 	I am considering a project that would index 315+ million
> documents. I am comfortable that the indexing will work well in creating
> an index ~800GB in size, but am concerned about the query performance.
> (Is this a = bad assumption?)

How fast do you need to return a response from a search?  The largest
index that I've created has over 200M documents and is about 125GB in
size.  The app has fairly low performance requirements and was done with
pretty minimal hardware...  

> What are the bottlenecks of performance as an index scales?  Memory?

Yeah, I find that a 2GB heap size can be a bit tight with an index that 
size.  16GB sounds about right, but make sure your JVM can use it.

> Cost is not a concern, so what would be the shortcomings of a
> theoretical machine with 16GB of ram, 4-16 cpus and 1-2 terabytes of
> space?  Would it be better to cluster machines to break apart the
> query?

Assuming your budget can afford it and your design can utilize all those
cpus effectively, I think you'd be worried about the underlying disk
subsystem and how fast you can read the blocks you need from the index.  
Use the smallest 10k rpm (or 15k rpm) drives you can so your subsystem
isn't spindle bound, multiple fibre HBAs and consider breaking apart that
massive index into smaller sub-indexes.


@work                                      @home

 vince.taluskie (at)               vince (at)
 Corporate Express; Technical Architect     Westminster, CO
 Phone:   303 664 2660            

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message