lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: copying data from one collection to another collection (solr cloud 521)
Date Tue, 14 Jul 2015 03:55:19 GMT
bq: does offline....

No. I'm talking about "collection aliasing". You can create an entirely
new collection, index to it however  you want then switch to using that
new collection.

bq: Any updates to EXISTING document in the LIVE collection should NOT be
replicated to the previous week(s) snapshot(s)

then give it a new ID maybe?

Best,
Erick

On Mon, Jul 13, 2015 at 3:21 PM, Raja Pothuganti
<RPothuganti@competitrack.com> wrote:
> Thank you Erick
>>Actually, my question is why do it this way at all? Why not index
>>directly to your "live" nodes? This is what SolrCloud is built for.
>>You an use "implicit" routing to create shards say, for each week and
>>age out the ones that are "too old" as well.
>
>
> Any updates to EXISTING document in the LIVE collection should NOT be
> replicated to the previous week(s) snapshot(s). Think of the snapshot(s)
> as an archive of sort and searchable independent of LIVE. We're aiming to
> support at most 2 archives of data in the past.
>
>
>>Another option would be to use "collection aliasing" to keep an
>>offline index up to date then switch over when necessary.
>
> Does offline indexing refers to this link
> https://github.com/cloudera/search/tree/0d47ff79d6ccc0129ffadcb50f9fe0b271f
> 102aa/search-mr
>
>
> Thanks
> Raja
>
>
>
> On 7/13/15, 3:14 PM, "Erick Erickson" <erickerickson@gmail.com> wrote:
>
>>Actually, my question is why do it this way at all? Why not index
>>directly to your "live" nodes? This is what SolrCloud is built for.
>>
>>There's the new backup/restore functionality that's still a work in
>>progress, see: https://issues.apache.org/jira/browse/SOLR-5750
>>
>>You an use "implicit" routing to create shards say, for each week and
>>age out the ones that are "too old" as well.
>>
>>Another option would be to use "collection aliasing" to keep an
>>offline index up to date then switch over when necessary.
>>
>>I'd really like to know this isn't an XY problem though, what's the
>>high-level problem you're trying to solve?
>>
>>Best,
>>Erick
>>
>>On Mon, Jul 13, 2015 at 12:49 PM, Raja Pothuganti
>><RPothuganti@competitrack.com> wrote:
>>>
>>> Hi,
>>> We are setting up a new SolrCloud environment with 5.2.1 on Ubuntu
>>>boxes. We currently ingest data into a large collection, call it LIVE.
>>>After the full ingest is done we then trigger a delta delta ingestion
>>>every 15 minutes to get the documents & data that have changed into this
>>>LIVE instance.
>>>
>>> In Solr 4.X using a Master / Slave setup we had slaves that would
>>>periodically (weekly, or monthly) refresh their data from the Master
>>>rather than every 15 minutes. We're now trying to figure out how to get
>>>this same type of setup using SolrCloud.
>>>
>>> Question(s):
>>> - Is there a way to copy data from one SolrCloud collection into
>>>another quickly and easily?
>>> - Is there a way to programmatically control when a replica receives
>>>it's data or possibly move it to another collection (without losing
>>>data) that updates on a  different interval? It ideally would be another
>>>collection name, call it Week1 ... Week52 ... to avoid a replica in the
>>>same collection serving old data.
>>>
>>> One option we thought of was to create a backup and then restore that
>>>into a new clean cloud. This has a lot of moving parts and isn't nearly
>>>as neat as the Master / Slave controlled replication setup. It also has
>>>the side effect of potentially taking a very long time to backup and
>>>restore instead of just copying the indexes like the old M/S setup.
>>>
>>> Any ideas of thoughts? Thanks in advance for you help.
>>> Raja
>

Mime
View raw message