lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SolrCloud" by ShawnHeisey
Date Sat, 26 Jan 2013 19:15:41 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrCloud" page has been changed by ShawnHeisey:
http://wiki.apache.org/solr/SolrCloud?action=diff&rev1=88&rev2=89

Comment:
Added information about solr port, zookeeper, and solr.xml.

  
  If you haven't yet, go through the simple [[http://lucene.apache.org/solr/tutorial.html|Solr
Tutorial]] to familiarize yourself with Solr. Note: reset all configuration and remove documents
from the tutorial before going through the cloud features. Copying the example directories
with pre-existing Solr indexes will cause document counts to be off.
  
- Solr embeds and uses Zookeeper as a repository for cluster configuration and coordination
- think of it as a distributed filesystem that contains information about all of the Solr
servers.
+ Solr embeds and uses Zookeeper as a repository for cluster configuration and coordination
- think of it as a distributed filesystem that contains information about all of the Solr
servers
+ 
+ If you want to use a port other than 8983 for Solr, see the note about solr.xml under Parameter
Reference below.
  
  === Example A: Simple two shard cluster ===
  {{http://people.apache.org/~markrmiller/2shard2server.jpg}}
@@ -165, +167 @@

  Create http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4
  
  About the params
+ 
   * '''name''': The name of the collection to be created
   * '''numShards''': The number of logical shards (sometimes called slices) to be created
as part of the collection
   * '''replicationFactor''': The number of copies of each document (or, the number of physical
replicas to be created for each logical shard of the collection.)  A replicationFactor of
3 means that there will be 3 replicas (one of which is normally designated to be the leader)
for each logical shard.  NOTE: in Solr 4.0, replicationFactor was the number of *additional*
copies as opposed to the total number of copies.
   * '''maxShardsPerNode''' : A create operation will spread numShards*replicationFactor shard-replica
across your live Solr nodes - fairly distributed, and never two replica of the same shard
on the same Solr node. If a Solr is not live at the point in time where the create operation
is carried out, it will not get any parts of the new collection. To prevent too many replica
being created on a single Solr node, use maxShardsPerNode to set a limit for how many replicas
the create operation is allowed to create on each node - default is 1. If it cannot fit the
entire collection numShards*replicationFactor replicas on you live Solrs it will not create
anything at all.
   * '''createNodeSet''': If not provided the create operation will create shard-replica spread
across all of your live Solr nodes. You can provide the "createNodeSet" parameter to change
the set of nodes to spread the shard-replica across. The format of values for this param is
"<node-name1>,<node-name2>,...,<node-nameN>" - e.g. "localhost:8983_solr,localhost:8984_solr,localhost:8985_solr"
- 
  
  Delete http://localhost:8983/solr/admin/collections?action=DELETE&name=mycollection
  
@@ -298, +300 @@

  
  
  === SolrCloud Instance Params ===
- These are set in solr.xml, but by default they are setup in solr.xml to also work with system
properties.
+ These are set in solr.xml, but by default they are setup in solr.xml to also work with system
properties.  Important note: the port found here will be used (via zookeeper) to inform the
rest of the cluster what port each Solr instance is using.  The default port is 8983.  The
example solr.xml uses the jetty.port system property, so if you want to use a port other than
8983, either you have to set this property when starting Solr, or you have to change solr.xml
to fit your particular installation.
  ||host ||Defaults to the first local host address found ||If the wrong host address is found
automatically, you can over ride the host address with this param. ||
  ||hostPort ||Defaults to the jetty.port system property ||The port that Solr is running
on - by default this is found by looking at the jetty.port system property. ||
  ||hostContext ||Defaults to solr ||The context path for the Solr webapp.  (Note: in Solr
4.0, it was mandatory that the hostContext not contain "/" or "_" characters.  Begining with
Solr 4.1, this limitation was removed, and it is recomended that you specify the begining
slash.  When running in the example jetty configs, the "hostContext" system property can be
used to control both the servlet context used by jetty, and the hostContext used by SolrCloud
-- eg: {{{-DhostContext=/solr}}}) ||
+ 
+ 
  
  
  === SolrCloud Instance ZooKeeper Params ===
@@ -372, +376 @@

  }}}
  === Zookeeper chroot ===
  If you are already using Zookeeper for other applications and you want to keep the ZNodes
organized by application, or if you want to have multiple separated SolrCloud clusters sharing
one Zookeeper ensemble you can use Zookeeper's "chroot" option. From Zookeeper's documentation:
http://zookeeper.apache.org/doc/r3.3.6/zookeeperProgrammers.html#ch_zkSessions
+ 
  {{{
  An optional "chroot" suffix may also be appended to the connection string. This will run
the client commands while interpreting all paths relative to this root (similar to the unix
chroot command). If used the example would look like: "127.0.0.1:4545/app/a" or "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002/app/a"
where the client would be rooted at "/app/a" and all paths would be relative to this root
- ie getting/setting/etc... "/foo/bar" would result in operations being run on "/app/a/foo/bar"
(from the server perspective).
  }}}
  To use this Zookeeper feature, simply start Solr with the "chroot" suffix in the zkHost
parameter. For example:
+ 
  {{{
  java -DzkHost=localhost:9983/foo/bar -jar start.jar
  }}}
  or
+ 
  {{{
  java -DzkHost=zoo1:9983,zoo2:9983,zoo3:9983/foo/bar -jar start.jar
  }}}
  '''NOTE:''' With Solr 4.0 you'll need to create the initial path in Zoookeeper before starting
Solr. Since Solr 4.1, the initial path will automatically be created if you are using either
''bootstrap_conf'' or ''boostrap_confdir''.
+ 
  == Known Limitations ==
  A small number of Solr search components do not support distributed search. In some cases,
a component may never get distributed support, in other cases it may just be a matter of time
and effort. All of the search components that do not yet support standard distributed search
have the same limitation with SolrCloud. You can pass distrib=false to use these components
on a single SolrCore.
  
  The Grouping feature only works if groups are in the same shard. Proper support will require
custom hashing and there is already a JIRA issue working towards this.
  
  == Glossary ==
- ||'''Collection''': ||A single search index.||
+ ||'''Collection''': ||A single search index. ||
  ||'''Shard''': ||A logical section of a single collection (also called Slice). Sometimes
people will talk about "Shard" in a physical sense (a manifestation of a logical shard) ||
  ||'''Replica''': ||A physical manifestation of a logical Shard, implemented as a single
Lucene index on a SolrCore ||
- ||'''Leader''': ||One Replica of every Shard will be designated as a Leader to coordinate
indexing for that Shard||
+ ||'''Leader''': ||One Replica of every Shard will be designated as a Leader to coordinate
indexing for that Shard ||
  ||'''SolrCore''': ||Encapsulates a single physical index. One or more make up logical shards
(or slices) which make up a collection. ||
  ||'''Node''': ||A single instance of Solr. A single Solr instance can have multiple SolrCores
that can be part of any number of collections. ||
  ||'''Cluster''': ||All of the nodes you are using to host SolrCores. ||
@@ -401, +409 @@

  
  == FAQ ==
   * '''Q:''' I'm seeing lot's of session timeout exceptions - what to do?
-   . '''A:''' Try raising the ZooKeeper session timeout by editing solr.xml - see the zkClientTimeout
attribute. The minimum session timeout is 2 times your ZooKeeper defined tickTime. The maximum
is 20 times the tickTime. The default tickTime is 2 seconds. You should avoiding raising this
for no good reason, but it should be high enough that you don't see a lot of false session
timeouts due to load, network lag, or garbage collection pauses. Some environments might need
to go as high as 30-60 seconds. 
+   . '''A:''' Try raising the ZooKeeper session timeout by editing solr.xml - see the zkClientTimeout
attribute. The minimum session timeout is 2 times your ZooKeeper defined tickTime. The maximum
is 20 times the tickTime. The default tickTime is 2 seconds. You should avoiding raising this
for no good reason, but it should be high enough that you don't see a lot of false session
timeouts due to load, network lag, or garbage collection pauses. Some environments might need
to go as high as 30-60 seconds.
   * '''Q:''' How do I use SolrCloud, but distribute updates myself?
    . '''A:''' Add the following UpdateProcessorFactory somewhere in your update chain: '''NoOpDistributingUpdateProcessorFactory'''
   * '''Q:''' What is the difference between a Collection and a SolrCore?

Mime
View raw message