lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SolrPerformanceData" by TomBurtonWest
Date Fri, 24 Jul 2009 23:03:44 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by TomBurtonWest:
http://wiki.apache.org/solr/SolrPerformanceData

------------------------------------------------------------------------------
  
  == HathiTrust Large Scale Solr Benchmarking ==
  
- [http://www.hathitrust.org HathiTrust] ''makes the digitized collections of some of the
nation’s great research libraries available for all.''  A major part of this effort is being
able to support full text search of the contents of these digitized collections.  As part
of this effort they have embarked on a effort to develop rigorously researched benchmarks
for Solr and Lucene that can be used to predict future performance as the HathiTrust expands.
 
+ [http://www.hathitrust.org HathiTrust] ''makes the digitized collections of some of the
nation’s great research libraries available for all.''  We are planning to index 20 million
full-text books in Solr. 
+ Our current index for 1 million full text books is about 225GB and we are getting average
response times of about 1/2 a second, but the 0.5% slowest queries are taking between 10 seconds
and 2 minutes.  We are working on strategies to improve overall response time.
  
+ Our benchmarking efforts to date are reported in 
+  * [http://www.hathitrust.org/large_scale_search The HathiTrust Large Scale Search page]
+  * [http://www.hathitrust.org/technical_reports/Large-Scale-Search.pdf Technical Report
on Large Scale Search Benchmarking (pdf)]
+  * [http://www.hathitrust.org/blogs/large-scale-search updates (including hardware information)]
+  * [http://www.hathitrust.org/documents/HathiTrust-DLFForum-200905.ppt part of a panel presentation
at the DLF (powerpoint)]
- They have focused on 5 areas:
-  1. Growing the index
-  1. Impact of memory
-  1. Using shards
-  1. Load testing
-  1. Faceting results
  
- This is an ongoing research effort, with the results written up at http://www.hathitrust.org/technical_reports/Large-Scale-Search.pdf.
- 

Mime
View raw message