lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sudarsan, Sithu D." <Sithu.Sudar...@fda.hhs.gov>
Subject RE: metrics for index ~100M docs ... Correction
Date Thu, 24 Sep 2009 21:54:39 GMT
 
Hi Joel,

Couple of quick points.

1. The metric for indexing only. 

2. It is 1000 docs/minute  (sorry for the earlier 1000/sec goof up)

3. Regarding search/query, it depends on many parameters... (similarity,
proximity, synonym look up etc.)

Sincerely,
Sithu D Sudarsan

-----Original Message-----
From: Sudarsan, Sithu D. [mailto:Sithu.Sudarsan@fda.hhs.gov] 
Sent: Thursday, September 24, 2009 1:11 PM
To: java-user@lucene.apache.org
Subject: RE: metrics for index ~100M docs

 
Hi Joel,

With approx. 100K doc size, on dual-quad core machine, (3.0Ghz) -
Windows platform, we have an average 1000 docs/sec. This includes text
extraction from PDF docs. 

Hope this helps.

Sincerely,
Sithu D Sudarsan


-----Original Message-----
From: Joel Halbert [mailto:joel@su3analytics.com] 
Sent: Thursday, September 24, 2009 11:17 AM
To: Lucene Users
Subject: metrics for index ~100M docs

Hi,

Does anyone know of any recent metrics & stats on building out an index
of  ~100mm documents (each doc approx 5k). I'm looking for approx stats
on time to build, time to query and infrastructure requirements (number
of machines & spec) to reasonably support an index of such a size.

Thanks, 
Joel


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message