incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Carlson <lancecarl...@gmail.com>
Subject Re: Mass updates
Date Wed, 15 May 2013 06:26:09 GMT
I use Redis to stick docs into RAM. Once they're in RAM, I like to use node
to parse the docs in the way I want them, then purge the dataset. Couchout
pulls them into RAM using Redis, couchin bulk_saves back into couchdb from
Redis. I tried to make the couchout/in tools language agnostic.

Anyway, you can certainly use whatever language you want and load all of
the docs into memory. Typically though if you're dealing with a non
statically compiled language, you're going to run into situations where
Redis would be more efficient.


On Wed, May 15, 2013 at 2:17 AM, James Marca <jmarca@translab.its.uci.edu>wrote:

> On Mon, May 13, 2013 at 02:24:50AM -0400, Lance Carlson wrote:
> > Oops, urls:
> >
> > https://github.com/lancecarlson/couchin.go
> > https://github.com/lancecarlson/couchout.go
> >
> > Feedback appreciated!
> >
>
> I don't understand the use case here, so I'd appreciate an example.
> If you can define a view or use all_docs to pull docs from couch and
> into redis, why use redis at all?  Why not just use couch directly,
> load docs into ram, and process them?
>
> I feel like I'm missing something obvious.
>
> Also, I've never stressed Redis much.  What happens when you bump up
> against ram limits?
>
> James
> >
> > On Mon, May 13, 2013 at 2:24 AM, Lance Carlson <lancecarlson@gmail.com
> >wrote:
> >
> > > Made a lot of updates to my couchout project. It now includes a couchin
> > > project as well. Might create another project for updating, but it's
> pretty
> > > easy for someone to script a node js script (or any language for that
> > > matter) that connects to redis, decodes and encodes base64.
> > >
> > >
> > > On Sat, May 11, 2013 at 2:27 AM, Andrey Kuprianov <
> > > andrey.kouprianov@gmail.com> wrote:
> > >
> > >> We do that and we have a cron to touch view every 5 min. Its just
> that at
> > >> that particular time we had to insert those 150k in one go (we were
> > >> migrating from mysql)
> > >>
> > >> Sent from my iPhone
> > >>
> > >> On 11 May, 2013, at 1:02 PM, Benoit Chesneau <bchesneau@gmail.com>
> wrote:
> > >>
> > >> > On May 9, 2013 1:17 PM, "Andrey Kuprianov" <
> andrey.kouprianov@gmail.com
> > >> >
> > >> > wrote:
> > >> >>
> > >> >> Rebuilding the views mentioned by James is hell! And the more
docs
> and
> > >> >> views you have, the longer your views will have to catch up with
> the
> > >> >> updates. We dont have the best of the servers, but ours (dedicated)
> > >> took
> > >> >> several hours to rebuild our views (not too many as well) after
we
> > >> > inserted
> > >> >> ~150k documents (we use full text search with Lucene as well,
so it
> > >> also
> > >> >> contributed to the overall sever slowdown).
> > >> >>
> > >> >> So my suggestion is:
> > >> >>
> > >> >> 1. Once you want to migrate your stuff, make a copy of your db.
> > >> >> 2. Do migration on the copy
> > >> >> 3. Allow for views to rebuild (you need to query each desing's
> document
> > >> >> single view once to trigger for views to start catching up with
the
> > >> >> updates). You'd probably ask, if it was possible to limit resource
> > >> usage
> > >> > of
> > >> >> Couch, when views are rebuilding, but i dont have answer to that
> > >> question.
> > >> >> Maybe someone else can help here...
> > >> >> 4. Switch database pointer from one DB to another.
> > >> >
> > >> > You don' t need to wait that all the docs are here to triggerthe
> > >> viewupdat,
> > >> > Jus trigger it more often. So view calculation will happen on
> smaller
> > >> set.
> > >> >
> > >> > You caneven make it //by using different ddocs.
> > >> >>
> > >> >>
> > >> >> On Thu, May 9, 2013 at 1:41 PM, Paul Davis <
> > >> paul.joseph.davis@gmail.com
> > >> >> wrote:
> > >> >>
> > >> >>> On Wed, May 8, 2013 at 10:24 PM, Charles S. Koppelman-Milstein
> > >> >>> <ckoppel@alumni.gwu.edu> wrote:
> > >> >>>> I am trying to understand whether Couch is the way to
go to meet
> some
> > >> > of
> > >> >>>> my organization's needs.  It seems pretty terrific.
> > >> >>>> The main concern I have is maintaining a consistent state
across
> code
> > >> >>>> releases.  Presumably, our data model will change over
the
> course of
> > >> >>>> time, and when it does, we need to make the several million
old
> > >> >>>> documents conform to the new model.
> > >> >>>>
> > >> >>>> Although I would love to pipe a view through an update
handler
> and
> > >> > call
> > >> >>>> it a day, I don't believe that option exists.  The two
ways I
> > >> >>>> understandto do this are:
> > >> >>>>
> > >> >>>> 1. Query all documents, update each doc client-side, and
PUT
> those
> > >> >>>> changes in the _bulk_docs API (presumably this should
be done in
> > >> > batches)
> > >> >>>> 2. Query the ids for all docs, and one at a time, PUT
them
> through an
> > >> >>>> update handler
> > >> >>>
> > >> >>> You are correct that there's no server side way to do a migration
> like
> > >> >>> you're asking for server side.
> > >> >>>
> > >> >>> The general pattern for these things is to write a view that
only
> > >> >>> includes the documents that need to be changed and then write
> > >> >>> something that goes through and processes each doc in the
view to
> the
> > >> >>> desired form (that removes it from the view). This way you
can
> easily
> > >> >>> know when you're done working. Its definitely possible to
write
> > >> >>> something that stores state and/or just brute force a db scan
each
> > >> >>> time you write run the migration.
> > >> >>>
> > >> >>> Performance wise, your first suggestion would probably be
the most
> > >> >>> performant although depending on document sizes and latencies
it
> may
> > >> >>> be possible to get better numbers using an update handler
but I
> doubt
> > >> >>> it unless you have huge docs and a super slow connection with
high
> > >> >>> latencies.
> > >> >>>
> > >> >>>> Are these options reasonably performant?  If we have to
do a
> > >> > mass-update
> > >> >>>> once a deployment, it's not terrible if it's not
> lightning-speed, but
> > >> > it
> > >> >>>> shouldn't take terribly long.  Also, I have read that
update
> handlers
> > >> >>>> have indexes built against them.  If this is a fire-once
option,
> is
> > >> > that
> > >> >>>> worthwhile?
> > >> >>>
> > >> >>> I'm not sure what you mean that update handlers have indexes
built
> > >> >>> against them. That doesn't match anything that currently exist
in
> > >> >>> CouchDB.
> > >> >>>
> > >> >>>> Which option is better?  Is there an even better way?
> > >> >>>
> > >> >>> There's nothing better than you're general ideas listed.
> > >> >>>
> > >> >>>> Thanks,
> > >> >>>> Charles
> > >> >>>
> > >>
> > >
> > >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message