couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Carlson <lancecarl...@gmail.com>
Subject Re: Mass updates
Date Mon, 13 May 2013 06:24:50 GMT
Oops, urls:

https://github.com/lancecarlson/couchin.go
https://github.com/lancecarlson/couchout.go

Feedback appreciated!


On Mon, May 13, 2013 at 2:24 AM, Lance Carlson <lancecarlson@gmail.com>wrote:

> Made a lot of updates to my couchout project. It now includes a couchin
> project as well. Might create another project for updating, but it's pretty
> easy for someone to script a node js script (or any language for that
> matter) that connects to redis, decodes and encodes base64.
>
>
> On Sat, May 11, 2013 at 2:27 AM, Andrey Kuprianov <
> andrey.kouprianov@gmail.com> wrote:
>
>> We do that and we have a cron to touch view every 5 min. Its just that at
>> that particular time we had to insert those 150k in one go (we were
>> migrating from mysql)
>>
>> Sent from my iPhone
>>
>> On 11 May, 2013, at 1:02 PM, Benoit Chesneau <bchesneau@gmail.com> wrote:
>>
>> > On May 9, 2013 1:17 PM, "Andrey Kuprianov" <andrey.kouprianov@gmail.com
>> >
>> > wrote:
>> >>
>> >> Rebuilding the views mentioned by James is hell! And the more docs and
>> >> views you have, the longer your views will have to catch up with the
>> >> updates. We dont have the best of the servers, but ours (dedicated)
>> took
>> >> several hours to rebuild our views (not too many as well) after we
>> > inserted
>> >> ~150k documents (we use full text search with Lucene as well, so it
>> also
>> >> contributed to the overall sever slowdown).
>> >>
>> >> So my suggestion is:
>> >>
>> >> 1. Once you want to migrate your stuff, make a copy of your db.
>> >> 2. Do migration on the copy
>> >> 3. Allow for views to rebuild (you need to query each desing's document
>> >> single view once to trigger for views to start catching up with the
>> >> updates). You'd probably ask, if it was possible to limit resource
>> usage
>> > of
>> >> Couch, when views are rebuilding, but i dont have answer to that
>> question.
>> >> Maybe someone else can help here...
>> >> 4. Switch database pointer from one DB to another.
>> >
>> > You don' t need to wait that all the docs are here to triggerthe
>> viewupdat,
>> > Jus trigger it more often. So view calculation will happen on smaller
>> set.
>> >
>> > You caneven make it //by using different ddocs.
>> >>
>> >>
>> >> On Thu, May 9, 2013 at 1:41 PM, Paul Davis <
>> paul.joseph.davis@gmail.com
>> >> wrote:
>> >>
>> >>> On Wed, May 8, 2013 at 10:24 PM, Charles S. Koppelman-Milstein
>> >>> <ckoppel@alumni.gwu.edu> wrote:
>> >>>> I am trying to understand whether Couch is the way to go to meet
some
>> > of
>> >>>> my organization's needs.  It seems pretty terrific.
>> >>>> The main concern I have is maintaining a consistent state across
code
>> >>>> releases.  Presumably, our data model will change over the course
of
>> >>>> time, and when it does, we need to make the several million old
>> >>>> documents conform to the new model.
>> >>>>
>> >>>> Although I would love to pipe a view through an update handler and
>> > call
>> >>>> it a day, I don't believe that option exists.  The two ways I
>> >>>> understandto do this are:
>> >>>>
>> >>>> 1. Query all documents, update each doc client-side, and PUT those
>> >>>> changes in the _bulk_docs API (presumably this should be done in
>> > batches)
>> >>>> 2. Query the ids for all docs, and one at a time, PUT them through
an
>> >>>> update handler
>> >>>
>> >>> You are correct that there's no server side way to do a migration like
>> >>> you're asking for server side.
>> >>>
>> >>> The general pattern for these things is to write a view that only
>> >>> includes the documents that need to be changed and then write
>> >>> something that goes through and processes each doc in the view to the
>> >>> desired form (that removes it from the view). This way you can easily
>> >>> know when you're done working. Its definitely possible to write
>> >>> something that stores state and/or just brute force a db scan each
>> >>> time you write run the migration.
>> >>>
>> >>> Performance wise, your first suggestion would probably be the most
>> >>> performant although depending on document sizes and latencies it may
>> >>> be possible to get better numbers using an update handler but I doubt
>> >>> it unless you have huge docs and a super slow connection with high
>> >>> latencies.
>> >>>
>> >>>> Are these options reasonably performant?  If we have to do a
>> > mass-update
>> >>>> once a deployment, it's not terrible if it's not lightning-speed,
but
>> > it
>> >>>> shouldn't take terribly long.  Also, I have read that update handlers
>> >>>> have indexes built against them.  If this is a fire-once option,
is
>> > that
>> >>>> worthwhile?
>> >>>
>> >>> I'm not sure what you mean that update handlers have indexes built
>> >>> against them. That doesn't match anything that currently exist in
>> >>> CouchDB.
>> >>>
>> >>>> Which option is better?  Is there an even better way?
>> >>>
>> >>> There's nothing better than you're general ideas listed.
>> >>>
>> >>>> Thanks,
>> >>>> Charles
>> >>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message