lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Query performance on a 315 Million document index (1TB)
Date Fri, 07 May 2004 06:42:49 GMT
That's big, and while I have not created such large indices with
Lucene, I would think that disk I/O would be the biggest issue.  That
is why Nutch has distributed search options built in, and their demo
has 'only' 100M documents.  Perhaps you can mimic distributed indexing
and searching approach of Nutch.

Otis

--- Will Allen <wga22@email.com> wrote:
> Hi,
> 	I am considering a project that would index 315+ million documents.
> I am comfortable that the indexing will work well in creating an
> index ~800GB in size, but am concerned about the query performance.
> (Is this a = bad
> assumption?)
> 
> What are the bottlenecks of performance as an index scales?  Memory? 
> = Cost is not a concern, so what would be the shortcomings of a
> theoretical = machine with 16GB of ram, 4-16 cpus and 1-2 terabytes
> of space?  Would it be = better to cluster machines to break apart
> the query?
> 
> Thank you for your serious responses,
> Will Allen
> -- 
> ___________________________________________________________
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message