lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "DistributedSearch" by YonikSeeley
Date Wed, 27 Feb 2008 18:50:13 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by YonikSeeley:
http://wiki.apache.org/solr/DistributedSearch

New page:
<!> ["Solr1.3"]

== What is Distributed Search? ==
When an index becomes too large to fit on a single system, or when a single query takes too
long to execute, an index can be split into multiple shards, and Solr can query and merge
results across those shards.

If single queries are currently fast enough and one simply wishes to expand the capacity (queries/sec)
of the search system, then standard whole [wiki:CollectionDistribution index replication]
should be used.

== Distributed Searching ==
The presence of the '''shards''' parameter in a request will cause that request to be distributed
across all shards in the list.  The syntax of '''shards''' is host:port/base_url[,host:port/base_url]*

Currently, only query requests will be distributed.  This includes requests to the standard
request handler (and subclasses such as the dismax request handler), and any other handler
(org.apache.solr.handler.component.SearchHandler) using standard components that support distributed
search.

The current components that support distributed search are
   * The Query component that returns documents matching a query
   * The Facet component, for facet.query and facet.field requests where facet.sorted=true
(the default)
   * The Highlighting component
   * the Debug component

== Distributed Indexing ==
It's up to the user to distribute documents across shards.  The easiest method to determine
what server a document should be indexed at is to use something like '''uniqueId.hashCode()
% numServers'''.

== Example ==
For simple functionality testing, it's easiest to just set up two local Solr servers on different
ports.
{{{
#make a copy 
cd solr
cp -r example example7574

#change the port number
perl -pi -e s/8983/7574/g example7574/etc/jetty.xml  example7574/exampledocs/post.sh

#in window 1, start up the server on port 8983
cd example
java -server -jar start.jar

#in window 2, start up the server on port 7574
cd example7574
java -server -jar start.jar

#in window 3, index some example documents to each server
cd example/exampledocs
./post.sh [a-m]*.xml
cd ../../example7574/exampledocs
./post.sh [n-z]*.xml

#now do a distributed search across both servers with your browser or curl
curl 'http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr'
}}}

Mime
View raw message