lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SolrReplication" by NoblePaul
Date Thu, 25 Jun 2009 09:41:37 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/SolrReplication

------------------------------------------------------------------------------
  
  = How does it work ? =
  
- This feature relies on the !IndexDeletionPolicy feature of Lucene. Through this API, Lucene
exposes !IndexCommits requests, along with the files associated with each commit. This exposures
enables us to quickly identify the files that need to be downloaded.
+ This feature relies on the [http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/core/org/apache/lucene/index/IndexDeletionPolicy.html
IndexDeletionPolicy] feature of Lucene. Through this API, Lucene exposes [http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/core/org/apache/lucene/index/IndexCommit.html
IndexCommits] as callbacks for each commit/optimize .An [http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/core/org/apache/lucene/index/IndexCommit.html
IndexCommit] exposes the files associated with each commit. This enables us to identify the
files that need to be replicated .
  
- True to the tradition of Solr, all operations are performed over a REST API. The !ReplicationHandler
exposes a REST API for discovering the current index version and the files (and their metadata)
associated with each version. The slave uses this API to discover the new files in the master's
index. The slave then determines which of those files need to be downloaded from the master.
It sends a request (using HTTP GET) to the master for the content of each file. This uses
a custom format (akin to the HTTP chunked encoding) to download the full content or a part
of each file. The slave stores downloaded files in a temp directory. Once all the required
files are downloaded, the slave moves all the files to the index directory and issues a 'commit'
 command.
+ 
+ True to the tradition of Solr, all operations are performed over a REST API. The !ReplicationHandler
exposes a REST API for all the operations it support.
+ 
+ == What happens when I commit or optimize? ==
+ When a commit/optimize is done on master , !ReplicationHandler reads the list of file names
which are associated with each commit point.This depends relies on the 'repkicateAfter' parameter
in the configuration to decide when these file names are to be fetched and stored from Lucene.

+ 
+ == How does the slave replicate ? ==
+ 
+ The master is totally unaware of the slaves. The slave continuously keeps polling the master
(depending on the 'pollInterval' parameter) to check the current index version the master.
If the slave finds out that the master has a newer version of the index it initiates a replication
process. The steps are as follows,
+ 
+  * slave issues a filelist command to get the list of the files. This command returns the
names of the files as well as some metadata (size,lastmodified,alias if any)
+  * The slave checks with its own index if it has any of those files in the local index.
It then proceeds to download the missing files (The command name is 'filecontent' ). This
uses a custom format (akin to the HTTP chunked encoding) to download the full content or a
part of each file. If the connection breaks in between , the download resumes from the point
it failed. At any point , it tries 5 times before giving up a replication altogether. 
+  * The files are downloaded into a temp dir . So if the slave or master  crashes in between
it does not corrupt anything. It just aborts the current replication . 
+  * After the download completes , all the new files are 'mov'ed to the live index directory
and the file's timesatamp is same as it's counterpart in master.
+  * A 'commit' command is issued on the slave by the Slave's !ReplicationHandler and the
new index is loaded,
+ 
+ 
+ == How are configuration files replicated ? ==
+ 
+  * The files that are to be replicated have to be mentioned explicitly in using the 'confFiles'
parameter. 
+  * Only files in the 'conf' dir of solr instance is replicated. 
+  * The files are replicated only along with a fresh index. That means even if a file is
changed in the master the file is not replicated only after there is a new commit on master
+  * Unlike the index files ,where the timestamp is good enough to figure out if they are
identical, conf files are compared against their checksum. The schema.xml files (on master
and slave) are smae if their checksume is same.
+  * Conf files are also downloaded to a temp dir before they are 'mov'ed to the original
files .The old files are renamed and kept in the same direcory. !ReplicationHandler does not
automatically clean up these old files.
+  * If a replication involved downloading of at least one conf file a core reload is issued
instead of a 'commit' command.
+ 
+ == What if I add documents to the slave or if slave index gets corrupted ? ==
+ 
+ If docs are added to the slave , then the slave is not in sync with the master anymore.
But , it does not do anything to keep it in sync with master till the master has a newer index.
When a commit happens on the master then the index version of the master will become different
from that of the slave. The slave fetches the list of files and finds that some of the files
(same name) are there in the local index with a different size/timestamp. This means that
the master and slave have incompatible indexes. Slave then copies all the files from master
(there may be scope to optimize this, but this is a rare case and may not be worth it) to
a new index dir and and asks the core to load the fresh index from the new directory. 
+ 
  
  == HTTP API ==
  These commands can be invoked over HTTP to the !ReplicationHandler 

Mime
View raw message