lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SolrPerformanceData" by TomBurtonWest
Date Sat, 20 Feb 2010 01:11:11 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrPerformanceData" page has been changed by TomBurtonWest.
http://wiki.apache.org/solr/SolrPerformanceData?action=diff&rev1=14&rev2=15

--------------------------------------------------

  
  == HathiTrust Large Scale Solr Benchmarking ==
  
+ [[http://www.hathitrust.org|HathiTrust]] ''makes the digitized collections of some of the
nation’s great research libraries available for all.''  We currently have slightly over
5 million full-text books indexed.  Our production index is spread across 10 shards on 4 machines.
With a total index size of over 2 Terabytes, our biggest bottleneck is disk I/O.  We did reduce
that significantly using CommonGrams, but disk I/O is still the bottleneck for performance.

- [[http://www.hathitrust.org|HathiTrust]] ''makes the digitized collections of some of the
nation’s great research libraries available for all.''  We are planning to index 20 million
full-text books in Solr. 
- Our current index for 1 million full text books is about 225GB and we are getting average
response times of about 1/2 a second, but the 0.5% slowest queries are taking between 10 seconds
and 2 minutes.  We are working on strategies to improve overall response time.
  
+ On our production index, the average Solr response time is around 200 ms, median response
time 90 ms, 90th percentile about 450 ms, and 99th percentile about 1.4 seconds.  Details
on the hardware are available at
+ [[http://www.hathitrust.org/blogs/large-scale-search/new-hardware-searching-5-million-volumes-full-text|New
hardware for searching 5 million plus volumes]]  Some details on performance are available
at: [[http://www.hathitrust.org/blogs/large-scale-search/performance-5-million-volumes|Performance
at 5 million volumes]].  Background and updates available at:[[http://www.hathitrust.org/blogs/large-scale-search|The
HathiTrust Large Scale Search blog]]  
- Our benchmarking efforts to date are reported in 
-  * [[http://www.hathitrust.org/large_scale_search|The HathiTrust Large Scale Search page]]
-  * [[http://www.hathitrust.org/technical_reports/Large-Scale-Search.pdf|Technical Report
on Large Scale Search Benchmarking (pdf)]]
-  * [[http://www.hathitrust.org/blogs/large-scale-search|updates (including hardware information)]]
-  * [[http://www.hathitrust.org/documents/HathiTrust-DLFForum-200905.ppt|part of a panel
presentation at the DLF (powerpoint)]]
  

Mime
View raw message