incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Kuprianov <andrey.koupria...@gmail.com>
Subject Re: Mass updates
Date Sat, 11 May 2013 06:27:59 GMT
We do that and we have a cron to touch view every 5 min. Its just that at that particular time
we had to insert those 150k in one go (we were migrating from mysql)

Sent from my iPhone

On 11 May, 2013, at 1:02 PM, Benoit Chesneau <bchesneau@gmail.com> wrote:

> On May 9, 2013 1:17 PM, "Andrey Kuprianov" <andrey.kouprianov@gmail.com>
> wrote:
>> 
>> Rebuilding the views mentioned by James is hell! And the more docs and
>> views you have, the longer your views will have to catch up with the
>> updates. We dont have the best of the servers, but ours (dedicated) took
>> several hours to rebuild our views (not too many as well) after we
> inserted
>> ~150k documents (we use full text search with Lucene as well, so it also
>> contributed to the overall sever slowdown).
>> 
>> So my suggestion is:
>> 
>> 1. Once you want to migrate your stuff, make a copy of your db.
>> 2. Do migration on the copy
>> 3. Allow for views to rebuild (you need to query each desing's document
>> single view once to trigger for views to start catching up with the
>> updates). You'd probably ask, if it was possible to limit resource usage
> of
>> Couch, when views are rebuilding, but i dont have answer to that question.
>> Maybe someone else can help here...
>> 4. Switch database pointer from one DB to another.
> 
> You don' t need to wait that all the docs are here to triggerthe viewupdat,
> Jus trigger it more often. So view calculation will happen on smaller set.
> 
> You caneven make it //by using different ddocs.
>> 
>> 
>> On Thu, May 9, 2013 at 1:41 PM, Paul Davis <paul.joseph.davis@gmail.com
>> wrote:
>> 
>>> On Wed, May 8, 2013 at 10:24 PM, Charles S. Koppelman-Milstein
>>> <ckoppel@alumni.gwu.edu> wrote:
>>>> I am trying to understand whether Couch is the way to go to meet some
> of
>>>> my organization's needs.  It seems pretty terrific.
>>>> The main concern I have is maintaining a consistent state across code
>>>> releases.  Presumably, our data model will change over the course of
>>>> time, and when it does, we need to make the several million old
>>>> documents conform to the new model.
>>>> 
>>>> Although I would love to pipe a view through an update handler and
> call
>>>> it a day, I don't believe that option exists.  The two ways I
>>>> understandto do this are:
>>>> 
>>>> 1. Query all documents, update each doc client-side, and PUT those
>>>> changes in the _bulk_docs API (presumably this should be done in
> batches)
>>>> 2. Query the ids for all docs, and one at a time, PUT them through an
>>>> update handler
>>> 
>>> You are correct that there's no server side way to do a migration like
>>> you're asking for server side.
>>> 
>>> The general pattern for these things is to write a view that only
>>> includes the documents that need to be changed and then write
>>> something that goes through and processes each doc in the view to the
>>> desired form (that removes it from the view). This way you can easily
>>> know when you're done working. Its definitely possible to write
>>> something that stores state and/or just brute force a db scan each
>>> time you write run the migration.
>>> 
>>> Performance wise, your first suggestion would probably be the most
>>> performant although depending on document sizes and latencies it may
>>> be possible to get better numbers using an update handler but I doubt
>>> it unless you have huge docs and a super slow connection with high
>>> latencies.
>>> 
>>>> Are these options reasonably performant?  If we have to do a
> mass-update
>>>> once a deployment, it's not terrible if it's not lightning-speed, but
> it
>>>> shouldn't take terribly long.  Also, I have read that update handlers
>>>> have indexes built against them.  If this is a fire-once option, is
> that
>>>> worthwhile?
>>> 
>>> I'm not sure what you mean that update handlers have indexes built
>>> against them. That doesn't match anything that currently exist in
>>> CouchDB.
>>> 
>>>> Which option is better?  Is there an even better way?
>>> 
>>> There's nothing better than you're general ideas listed.
>>> 
>>>> Thanks,
>>>> Charles
>>> 

Mime
View raw message