lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "DistributedSearch" by YonikSeeley
Date Tue, 19 Jul 2011 13:05:58 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "DistributedSearch" page has been changed by YonikSeeley:
http://wiki.apache.org/solr/DistributedSearch?action=diff&rev1=19&rev2=20

Comment:
updates to current reality

     * Doesn't support !QueryElevationComponent
     * The index could change between stages, e.g. a document that matched a query and was
subsequently changed may no longer match but will still be retrieved.
     * Doesn't currently support date faceting (see https://issues.apache.org/jira/browse/SOLR-1709
)
-    * Currently only supports sorted field facets (Solr 1.4+ supports both)
-    * Number of shards is limited by number of characters allowed for GET method's URI; most
web servers generally support at least 4000 characters, but limit still exists to prevent
denial-of-service attacks.
     * Makes it more inefficient to use a high "start" parameter. For example, if you request
start=500000&rows=25 on an index with 500,000+ docs per shard, this will currently result
in 500,000 records getting sent over the network from the shard to the coordinating Solr instance.
If you had a single-shard index, in contrast, only 25 records would ever get sent over the
network. (Granted, setting start this high is not something many people need to do.)
  
  == Distributed Deadlock ==
  Each shard may also serve top-level query requests and then make sub-requests to all of
the other shards.
  In this configuration, care should be taken to ensure that the max number of threads serving
HTTP requests
  in the servlet container is greater than the possible number of requests from both top-level
clients and
- other shards.  If this is not the case, a distributed deadlock is possible.
+ other shards (the solr example server is already configured correctly).  If this is not
the case, a distributed deadlock is possible.
  
  Consider the simplest case of two shards, each with just a single thread to service HTTP
requests.  Both threads could receive a top-level request concurrently, and make sub-requests
to each other.  Because there are no more remaining threads to service requests, the servlet
containers will block the incoming requests until the other pending requests are finished
(but they won't finish since they are waiting for the sub-requests).
  
@@ -51, +49 @@

  It's up to the user to distribute documents across shards.  The easiest method to determine
what server a document should be indexed at is to use something like '''uniqueId.hashCode()
% numServers'''.
  
  See Also...
+  * SolrCloud
-  * KattaIntegration
-  * ZooKeeperIntegration
  
  = Distributed Search Example =
  For simple functionality testing, it's easiest to just set up two local Solr servers on
different ports.

Mime
View raw message