lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Goodell <good...@gmail.com>
Subject Re: Query performance on a 315 Million document index (1TB)
Date Fri, 07 May 2004 15:58:15 GMT
Although I've never indexed anything quite that large, i've had good
experiences with splitting the index out over a cluster.  (for
example, a set that would be about 4 seconds per complicated query on
one of our machines becomes around a second when spread out over 6)  I
think the reason why this helps is because of the disk I/O speed
bounding of performance that the others have mentioned, and how adding
another disk array adds to the effective disk bandwidth.

good luck
- andy g

On Fri, 07 May 2004 04:47:55 +0500, Will Allen <wga22@email.com> wrote:
> 
> Hi,
>         I am considering a project that would index 315+ million documents. I am comfortable
that the indexing will work well in creating an index ~800GB in size, but am concerned about
the query performance. (Is this a = bad
> assumption?)
> 
> What are the bottlenecks of performance as an index scales?  Memory?  = Cost is not a
concern, so what would be the shortcomings of a theoretical = machine with 16GB of ram, 4-16
cpus and 1-2 terabytes of space?  Would it be = better to cluster machines to break apart
the query?
> 
> Thank you for your serious responses,
> Will Allen
> --
> ___________________________________________________________
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message