couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Marca <jma...@translab.its.uci.edu>
Subject Re: Mass updates
Date Wed, 15 May 2013 06:17:54 GMT
On Mon, May 13, 2013 at 02:24:50AM -0400, Lance Carlson wrote:
> Oops, urls:
> 
> https://github.com/lancecarlson/couchin.go
> https://github.com/lancecarlson/couchout.go
> 
> Feedback appreciated!
> 

I don't understand the use case here, so I'd appreciate an example. 
If you can define a view or use all_docs to pull docs from couch and
into redis, why use redis at all?  Why not just use couch directly,
load docs into ram, and process them?

I feel like I'm missing something obvious. 

Also, I've never stressed Redis much.  What happens when you bump up
against ram limits?

James
> 
> On Mon, May 13, 2013 at 2:24 AM, Lance Carlson <lancecarlson@gmail.com>wrote:
> 
> > Made a lot of updates to my couchout project. It now includes a couchin
> > project as well. Might create another project for updating, but it's pretty
> > easy for someone to script a node js script (or any language for that
> > matter) that connects to redis, decodes and encodes base64.
> >
> >
> > On Sat, May 11, 2013 at 2:27 AM, Andrey Kuprianov <
> > andrey.kouprianov@gmail.com> wrote:
> >
> >> We do that and we have a cron to touch view every 5 min. Its just that at
> >> that particular time we had to insert those 150k in one go (we were
> >> migrating from mysql)
> >>
> >> Sent from my iPhone
> >>
> >> On 11 May, 2013, at 1:02 PM, Benoit Chesneau <bchesneau@gmail.com> wrote:
> >>
> >> > On May 9, 2013 1:17 PM, "Andrey Kuprianov" <andrey.kouprianov@gmail.com
> >> >
> >> > wrote:
> >> >>
> >> >> Rebuilding the views mentioned by James is hell! And the more docs
and
> >> >> views you have, the longer your views will have to catch up with the
> >> >> updates. We dont have the best of the servers, but ours (dedicated)
> >> took
> >> >> several hours to rebuild our views (not too many as well) after we
> >> > inserted
> >> >> ~150k documents (we use full text search with Lucene as well, so it
> >> also
> >> >> contributed to the overall sever slowdown).
> >> >>
> >> >> So my suggestion is:
> >> >>
> >> >> 1. Once you want to migrate your stuff, make a copy of your db.
> >> >> 2. Do migration on the copy
> >> >> 3. Allow for views to rebuild (you need to query each desing's document
> >> >> single view once to trigger for views to start catching up with the
> >> >> updates). You'd probably ask, if it was possible to limit resource
> >> usage
> >> > of
> >> >> Couch, when views are rebuilding, but i dont have answer to that
> >> question.
> >> >> Maybe someone else can help here...
> >> >> 4. Switch database pointer from one DB to another.
> >> >
> >> > You don' t need to wait that all the docs are here to triggerthe
> >> viewupdat,
> >> > Jus trigger it more often. So view calculation will happen on smaller
> >> set.
> >> >
> >> > You caneven make it //by using different ddocs.
> >> >>
> >> >>
> >> >> On Thu, May 9, 2013 at 1:41 PM, Paul Davis <
> >> paul.joseph.davis@gmail.com
> >> >> wrote:
> >> >>
> >> >>> On Wed, May 8, 2013 at 10:24 PM, Charles S. Koppelman-Milstein
> >> >>> <ckoppel@alumni.gwu.edu> wrote:
> >> >>>> I am trying to understand whether Couch is the way to go to
meet some
> >> > of
> >> >>>> my organization's needs.  It seems pretty terrific.
> >> >>>> The main concern I have is maintaining a consistent state across
code
> >> >>>> releases.  Presumably, our data model will change over the
course of
> >> >>>> time, and when it does, we need to make the several million
old
> >> >>>> documents conform to the new model.
> >> >>>>
> >> >>>> Although I would love to pipe a view through an update handler
and
> >> > call
> >> >>>> it a day, I don't believe that option exists.  The two ways
I
> >> >>>> understandto do this are:
> >> >>>>
> >> >>>> 1. Query all documents, update each doc client-side, and PUT
those
> >> >>>> changes in the _bulk_docs API (presumably this should be done
in
> >> > batches)
> >> >>>> 2. Query the ids for all docs, and one at a time, PUT them
through an
> >> >>>> update handler
> >> >>>
> >> >>> You are correct that there's no server side way to do a migration
like
> >> >>> you're asking for server side.
> >> >>>
> >> >>> The general pattern for these things is to write a view that only
> >> >>> includes the documents that need to be changed and then write
> >> >>> something that goes through and processes each doc in the view
to the
> >> >>> desired form (that removes it from the view). This way you can
easily
> >> >>> know when you're done working. Its definitely possible to write
> >> >>> something that stores state and/or just brute force a db scan each
> >> >>> time you write run the migration.
> >> >>>
> >> >>> Performance wise, your first suggestion would probably be the most
> >> >>> performant although depending on document sizes and latencies it
may
> >> >>> be possible to get better numbers using an update handler but I
doubt
> >> >>> it unless you have huge docs and a super slow connection with high
> >> >>> latencies.
> >> >>>
> >> >>>> Are these options reasonably performant?  If we have to do
a
> >> > mass-update
> >> >>>> once a deployment, it's not terrible if it's not lightning-speed,
but
> >> > it
> >> >>>> shouldn't take terribly long.  Also, I have read that update
handlers
> >> >>>> have indexes built against them.  If this is a fire-once option,
is
> >> > that
> >> >>>> worthwhile?
> >> >>>
> >> >>> I'm not sure what you mean that update handlers have indexes built
> >> >>> against them. That doesn't match anything that currently exist
in
> >> >>> CouchDB.
> >> >>>
> >> >>>> Which option is better?  Is there an even better way?
> >> >>>
> >> >>> There's nothing better than you're general ideas listed.
> >> >>>
> >> >>>> Thanks,
> >> >>>> Charles
> >> >>>
> >>
> >
> >


Mime
View raw message