lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: How to migrate content of a collection to a new collection
Date Wed, 23 Jul 2014 15:13:12 GMT
Per:

Given that you said that the field redefinition also includes routing
info.... I don't see
any other way than re-indexing each collection. That said, could you use the
collection aliasing and do one collection at a time?

Best,
Erick


On Tue, Jul 22, 2014 at 11:45 PM, Per Steffensen <steff@designware.dk>
wrote:

> Hi
>
> We have numerous collections each with numerous shards spread across
> numerous machines. We just discovered that all documents have a field with
> a wrong value and besides that we would like to add a new field to all
> documents
> * The field with the wrong value is a long, DocValued, Indexed and Stored.
> Some (about half) of the documents need to have a constant added to their
> current value
> * The field we want to add will be and int, DocValued, Indexed and Stored.
> Needs to be added to all documents, but will have different values among
> the documents
>
> How to achieve our goal in the easiest possible way?
>
> We thought about spooling/streaming from the existing collection into a
> "twin"-collection, then delete the existing collection and finally rename
> the "twin"-collection to have the same name as the original collection.
> Basically indexing all documents again. If that is the easiest way, how do
> we query in a way so that we get all documents streamed. We cannot just do
> a *:* query that returns everything into memory and the index from there,
> because we have billions of documents (not enough memory). Please note that
> we are on 4.4, which does not contain the new CURSOR-feature. Please also
> note that speed is an important factor for us.
>
> Guess this could also be achieved by doing 1-1 migration on shard-level
> instead of collection-level, keeping everything in the new collections on
> the same machine as where they lived in the old collections. That could
> probably complete faster than the 1-1 on collection-level approach. But
> this 1-1 on shard-level approach is not very good for us, because the long
> field we need to change is also part of the id (controlling the routing to
> a particular shard) and therefore actually we also need to change the id on
> all documents. So if we do the 1-1 on shard-level approach, we will end up
> having documents in shards that they actually do not be to (they would not
> have been routed there by the routing system in Solr). We might be able to
> live with this disadvantage if 1-1 on shard-level can be easily achieved
> much faster than the 1-1 on collection-level.
>
> Any input is very much appreciated! Thanks
>
> Regards, Per Steffensen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message