lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: copying data from one collection to another collection (solr cloud 521)
Date Mon, 13 Jul 2015 20:14:46 GMT
Actually, my question is why do it this way at all? Why not index
directly to your "live" nodes? This is what SolrCloud is built for.

There's the new backup/restore functionality that's still a work in
progress, see: https://issues.apache.org/jira/browse/SOLR-5750

You an use "implicit" routing to create shards say, for each week and
age out the ones that are "too old" as well.

Another option would be to use "collection aliasing" to keep an
offline index up to date then switch over when necessary.

I'd really like to know this isn't an XY problem though, what's the
high-level problem you're trying to solve?

Best,
Erick

On Mon, Jul 13, 2015 at 12:49 PM, Raja Pothuganti
<RPothuganti@competitrack.com> wrote:
>
> Hi,
> We are setting up a new SolrCloud environment with 5.2.1 on Ubuntu boxes. We currently
ingest data into a large collection, call it LIVE. After the full ingest is done we then trigger
a delta delta ingestion every 15 minutes to get the documents & data that have changed
into this LIVE instance.
>
> In Solr 4.X using a Master / Slave setup we had slaves that would periodically (weekly,
or monthly) refresh their data from the Master rather than every 15 minutes. We're now trying
to figure out how to get this same type of setup using SolrCloud.
>
> Question(s):
> - Is there a way to copy data from one SolrCloud collection into another quickly and
easily?
> - Is there a way to programmatically control when a replica receives it's data or possibly
move it to another collection (without losing data) that updates on a  different interval?
It ideally would be another collection name, call it Week1 ... Week52 ... to avoid a replica
in the same collection serving old data.
>
> One option we thought of was to create a backup and then restore that into a new clean
cloud. This has a lot of moving parts and isn't nearly as neat as the Master / Slave controlled
replication setup. It also has the side effect of potentially taking a very long time to backup
and restore instead of just copying the indexes like the old M/S setup.
>
> Any ideas of thoughts? Thanks in advance for you help.
> Raja

Mime
View raw message