lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Trivial Update of "SolrPerformanceFactors" by YonikSeeley
Date Thu, 29 Oct 2009 04:21:12 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrPerformanceFactors" page has been changed by YonikSeeley.
http://wiki.apache.org/solr/SolrPerformanceFactors?action=diff&rev1=21&rev2=22

--------------------------------------------------

     * Con: More segment merges slow down indexing.
  
  === HashDocSet Max Size Considerations ===
+ <!> This is only a consideration for Solr 1.3 and earlier.
  
  The hashDocSet is an optimization specified in the solrconfig.xml that enables an int hash
representation for filters (docSets) when the number of items in the set is less than maxSize.
 For smaller sets, this representation is more memory efficient, more efficient to iterate,
and faster to take intersections.
  
@@ -114, +115 @@

  
  Consult the documentation for the application server you are using (ie: !TomCat, Resin,
Jetty, etc...) for more information on how to configure page compression.
  
- == Embedded vs HTTP Post ==
+ == Indexing Performance ==
+ You can use [EmbeddedSolr] to do bulk indexing to an embedded instance of Solr and avoid
any HTTP overhead.  Most of the overhead is due to latency and can be hidden using multiple
threads and sending multiple documents per add request.  SolrJ (the Solr Java client) [[http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html|StreamingUpdateSolrServer]]
makes this easy by opening multiple connections to a Solr instance and streaming the added
documents over those open connections.
  
+ Other bulk update methods such as the [[UpdateCSV|CSV Loader]] also offer very good performance.
- Using an [EmbeddedSolr] for indexing can be over 50% faster than one using XML messages
that are posted.  
- 
- For example it took 2:10:23 to index 3 million records and optimize, while it took 3:21:36
on the same machine to index using HTTP Post with 10 records/post or 2:37:17 with 200 records/post.
 If you consider that optimize is only one call, then the difference is slightly bigger. 
The machine for these sample numbers was a 3Ghz Pentium 4 desktop machine.
- 
- However the tradeoff is larger records/post requires greater memory footprint.  As the records/post
becomes higher it makes more sense to have separate threads for getting records from database/files
and another for posting the XML messages to Solr (could also double buffer).  
- 
- See [[http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/package-summary.html|java.util.concurrency
javadoc]] for more information on threading.
- 
- Also consider using the [[http://svn.apache.org/repos/asf/lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java|StreamingUpdateSolrServer.java]]
for bulk update request.  
- 
- 
- == RAM Usage Considerations ==
  
  === OutOfMemoryErrors ===
  
@@ -143, +134 @@

  
  === Memory allocated to the Java VM ===
  
- The easiest way to fight this error, assuming the Java virtual machine isn't already using
all your physical memory, is to increase the amount of memory allocated to the Java virtual
machine running Solr. To do this for the example/ in the Solr distribution, if you're running
the standard Sun virtual machine, you can use the -Xms and -Xmx command-line parameters:
+ The easiest way to fight this error is to increase the amount of memory allocated to the
Java virtual machine running Solr. To do this for the example/ in the Solr distribution, if
you're running the standard Sun virtual machine, you can use the -Xms and -Xmx command-line
parameters:
  
  {{{
  java -Xms512M -Xmx1024M -jar start.jar

Mime
View raw message