lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yajun Liu" <>
Subject Re: [jira] Commented: (SOLR-561) Solr replication by Solr (for windows also)
Date Fri, 27 Jun 2008 15:41:35 GMT
We plan to support many indexes, so potentially you might have lots of
active master as well, so if you have one place to go to find where to
get index from, that would make operation much easier.

When I commit update, sometimes I got FileNotFound exception, so I
rollback to previous snapshot. This have to do automatically.

As I mentioned, I embed solr in our application server, The http port
is used for many other things, so a dedicated port is used. And if you
look at my protocol, it is really a lower stack protocol, so put it on
top of http is not very efficient.

I don't know much of the format of compound file, but I think to
replicate it, it is not good idea to copy the whole file, because that
could be easily more than 100M at least for our case. In our case, we
have very small frequent update of index, Out of 100M, there must be
lots of identical blocks. If it is needed, I could contribute rsyn
based implementation.

At last,the hardlink is really to avoid to make copy of files, since
each replication we only bring in new or changed files, not the whole


On Thu, Jun 26, 2008 at 11:21 PM, Noble Paul (JIRA) <> wrote:
>    [
> Noble Paul commented on SOLR-561:
> ---------------------------------
> bq: First we have an active master, some standby masters and search slaves
> This looks like a good approach. In the current design I must allow users to specify
multiple 'materUrl' . This must take care of one or more standby masters.  It can automatically
fallback to another master if one fails.
> bq.On active master, there is a index snapshots manager. Whenever there's an update,
it takes a snapshot. On window, it uses copy (I should try fsutil) and on linux it uses hard
link..The snapshot manager also clean up old snapshots. From time to time, I still got index
corruption when commit update. When that happen, shapshot manager allows us to rollback to
previous good snapshot.
> How can I know if the index got corrupted? if I can know it the best way to implement
that would be to add a command to ReplicationHandler to rollback to latest .
> bq.On active master, there is a replication server component which listens at a specific
> plain socket communication is more work than relying over the simple http protocol .The
little extra efficiency you may achieve may not justify that (http is not too solw either).
In this case the servlet container provides you with sockets , threads etc etc. Take a look
at the patch on how efficiently is it done in the current patch.
> bq.client creates a tmp directory and hard link everything from its local index directory,
then for each file in the file list, if it does not exit locally, get new file from server;
if it is newer than local one, ask server for update like rsync; if local files do not exist
in file list, delete them. in the case of compound file is used for index, the file update
will update only diff blocks.
> The current implementation is more or less like what you have done. For a compound file
I am not sure if a diff based sync can be more efficient. Because it is hard to get the similar
blocks in the file. I rely on checksums  of whole file. If there is an efficient mechanism
to obtain identical blocks, share the code I can incorporate that
> The hardlink approach may be not necessary now as I made the SolrCore not to hardcode
the index folder.
>> Solr replication by Solr (for windows also)
>> -------------------------------------------
>>                 Key: SOLR-561
>>                 URL:
>>             Project: Solr
>>          Issue Type: New Feature
>>          Components: replication
>>    Affects Versions: 1.3
>>         Environment: All
>>            Reporter: Noble Paul
>>         Attachments: deletion_policy.patch, SOLR-561.patch, SOLR-561.patch
>> The current replication strategy in solr involves shell scripts . The following are
the drawbacks with the approach
>> *  It does not work with windows
>> * Replication works as a separate piece not integrated with solr.
>> * Cannot control replication from solr admin/JMX
>> * Each operation requires manual telnet to the host
>> Doing the replication in java has the following advantages
>> * Platform independence
>> * Manual steps can be completely eliminated. Everything can be driven from solrconfig.xml
>> ** Adding the url of the master in the slaves should be good enough to enable replication.
Other things like frequency of
>> snapshoot/snappull can also be configured . All other information can be automatically
>> * Start/stop can be triggered from solr/admin or JMX
>> * Can get the status/progress while replication is going on. It can also abort an
ongoing replication
>> * No need to have a login into the machine
>> This issue can track the implementation of solr replication in java
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.

View raw message