lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Solovyev <g...@zimbra.com>
Subject Re: How to restore an index from a backup over HTTP
Date Sat, 16 Aug 2014 10:03:59 GMT
Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty straight forward,
but the main concern I have is the internal data format that ReplicationHandler and SnapPuller
use. This new handler as well as the code that I've already written to download the index
files from Solr will depend on that format. Unfortunately, this format is not documented and
is not abstracted by SolrJ, so I wonder what I can do to make sure it does not change on us
without notice.

Thanks,
Greg

----- Original Message -----
From: "Shawn Heisey" <solr@elyograg.org>
To: solr-user@lucene.apache.org
Sent: Friday, August 15, 2014 7:31:19 PM
Subject: Re: How to restore an index from a backup over HTTP

On 8/15/2014 5:51 AM, Greg Solovyev wrote:
> What I want to achieve is being able to send the backed up index to Solr (either standalone
or with ZooKeeper) in a way similar to creating a new Collection. I.e. create a new collection
and upload an exiting index directly into that Collection. I've looked through Solr code and
so far I have not found a handler that would allow this scenario. So, the last idea is to
implement a special handler for this case, perhaps extending CoreAdminHandler. ReplicationHandler
together with SnapPuller do pretty much what I need to do, except that the action has to be
initiated by the receiving Solr server and I need to initiate the action externally. I.e.,
instead of having Solr slave download an index from Solr master, I need to feed the index
to Solr master and ideally this would work the same way in standalone and SolrCloud modes.


I have not made any attempt to verify what I'm stating below.  It may
not work.

What I think I would *try* is setting up a standalone Solr (no cloud) on
the backup server.  Use scripted index/config copies and Solr start/stop
actions to get the index up and running on a known core in the
standalone Solr.  Then use the replication handler's HTTP API to
replicate the index from that standalone server to each of the replicas
in your cluster.

https://wiki.apache.org/solr/SolrReplication#HTTP_API
https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler

One thing that I do not know is whether SolrCloud itself might interfere
with these actions, or whether it might automatically take care of
additional replicas if you replicate to the shard leader.  If SolrCloud
*would* interfere, then this idea might need special support in
SolrCloud, perhaps as an extension to the Collections API.  If it won't
interfere, then the use-case would need to be documented (on the user
wiki at a minimum) so that committers will be aware of it and preserve
the capability in future versions.  An extension to the Collections API
might be a good idea either way -- I've seen a number of questions about
capability that falls under this basic heading.

Thanks,
Shawn

Mime
View raw message