lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "FAQ" by ShawnHeisey
Date Sun, 14 Feb 2016 08:29:10 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "FAQ" page has been changed by ShawnHeisey:
https://wiki.apache.org/solr/FAQ?action=diff&rev1=104&rev2=105

Comment:
Improve indexing speed questions.

  
  There's no way that Solr can guess what the user's intentions are when adding a new node
to a SolrCloud cluster.
  
- The new node might be intended for an entirely new collection.  If Solr automatically creates
replicas on a cloud with billions of documents, it might take ''hours'' for that replication
to complete, after which those replicas must be manually deleted so the new nodes can be used
for the intended purpose.  Users with very large indexes would be VERY irritated if this were
to happen automatically.
+ If Solr automatically creates replicas on a cloud with billions of documents, it might take
''hours'' for that replication to complete.  Users with very large indexes would be VERY irritated
if this were to happen automatically.
  
+ The new nodes might be intended for an entirely new collection, not new replicas on existing
collections.  Users who have this intention would also be unhappy if Solr decided to add new
replicas.
+ 
- Even when the intent '''is''' to add new replicas, Solr has no way of knowing '''which'''
collections should be replicated.  On a very large cloud with hundreds of collections, choosing
to add a replica to '''all''' of them might very well use up all the disk space on the new
node.
+ Even when the intent '''is''' to add new replicas, Solr has no way of knowing '''which'''
collections should be replicated.  On a very large cloud with hundreds of collections, choosing
to add a replica to '''all''' of them could take a very long time and use up all the disk
space on the new node.
  
  Additionally, creating replicas uses a lot of disk and network I/O bandwidth.  If a node
is added during normal hours and replication starts automatically, it might drastically affect
query performance.
  

Mime
View raw message