lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "DistributedSearch" by HossMan
Date Tue, 15 Sep 2009 20:35:12 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by HossMan:
http://wiki.apache.org/solr/DistributedSearch

The comment on the change is:
TOC, header normalize, and links from FrontPage

------------------------------------------------------------------------------
  <!> ["Solr1.3"]
  
- == What is Distributed Search? ==
+ = What is Distributed Search? =
+ 
  When an index becomes too large to fit on a single system, or when a single query takes
too long to execute, an index can be split into multiple shards, and Solr can query and merge
results across those shards.
  
  If single queries are currently fast enough and one simply wishes to expand the capacity
(queries/sec) of the search system, then standard whole [wiki:CollectionDistribution index
replication] should be used.
  
+ [[TableOfContents()]]
+ 
- == Distributed Searching ==
+ = Distributed Searching =
  The presence of the '''shards''' parameter in a request will cause that request to be distributed
across all shards in the list.  The syntax of '''shards''' is host:port/base_url[,host:port/base_url]*
  
  Currently, only query requests will be distributed.  This includes requests to the standard
request handler (and subclasses such as the dismax request handler), and any other handler
(org.apache.solr.handler.component.SearchHandler) using standard components that support distributed
search.
@@ -21, +24 @@

  
  See also WritingDistributedSearchComponents
  
- == Distributed Searching Limitations ==
+ = Distributed Searching Limitations =
  
     * Documents must have a unique key
     * When duplicate doc IDs are received, Solr chooses the first doc and discards subsequent
ones
@@ -32, +35 @@

     * Currently only supports sorted field facets (Solr 1.4+ supports both)
     * Number of shards is limited by number of characters allowed for GET method's URI; most
web servers generally support at least 4000 characters, but limit still exists to prevent
denial-of-service attacks.
  
- === Distributed Deadlock ===
+ == Distributed Deadlock ==
  Each shard may also serve top-level query requests and then make sub-requests to all of
the other shards.
  In this configuration, care should be taken to ensure that the max number of threads serving
HTTP requests
  in the servlet container is greater than the possible number of requests from both top-level
clients and
@@ -40, +43 @@

  
  Consider the simplest case of two shards, each with just a single thread to service HTTP
requests.  Both threads could receive a top-level request concurrently, and make sub-requests
to each other.  Because there are no more remaining threads to service requests, the servlet
containers will block the incoming requests until the other pending requests are finished
(but they won't finish since they are waiting for the sub-requests).
  
- == Distributed Indexing ==
+ = Distributed Indexing =
  It's up to the user to distribute documents across shards.  The easiest method to determine
what server a document should be indexed at is to use something like '''uniqueId.hashCode()
% numServers'''.
  
- == Example ==
+ See Also...
+  * KattaIntegration
+  * ZooKeeperIntegration
+ 
+ = Distributed Search Example =
  For simple functionality testing, it's easiest to just set up two local Solr servers on
different ports.
  {{{
  #make a copy 

Mime
View raw message