Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: pass (athena.apache.org: domain of lancecarlson@gmail.com
 designates 209.85.219.50 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <20130515061754.GG21091@translab.its.uci.edu>
References: <518B16F5.2040306@alumni.gwu.edu>
 <CAJ_m3YDamHmbJK2pfxtTJ=Jy3OA9MxqRwX_QWAhqqnV+vMorCQ@mail.gmail.com>
 <CAG+HO1yx-0SNziDPYkwNT7KCH7LsSZx7-Yvb724mBDthLMs6oQ@mail.gmail.com>
 <CAJNb-9p_nssv9DkRrT9FWT2NP7v9r_KK6UQKR56YxFTdELRkNw@mail.gmail.com>
 <5A772B81-F184-4E2D-82E9-A96D094632D9@gmail.com>
 <CALDn6=gd_XXc96Wiv-XzRE6fUjA9Hmk1ae5S-m-8RkAAjOnPPw@mail.gmail.com>
 <CALDn6=jJH3Abd78kqX94sEz28r+CtdRXT8W=Q7xN+-jT9DPmfw@mail.gmail.com>
 <20130515061754.GG21091@translab.its.uci.edu>
From: Lance Carlson <lancecarlson@gmail.com>
Date: Wed, 15 May 2013 02:26:09 -0400
Message-ID: 
 <CALDn6=ieNh8MnPRjnEma+4TcVJakhLHUHHidMc8z1EL69iHPPQ@mail.gmail.com>
Subject: Re: Mass updates
To: "user@couchdb.apache.org" <user@couchdb.apache.org>
Content-Type: multipart/alternative; boundary=089e013cb81a81d20e04dcbbd460

--089e013cb81a81d20e04dcbbd460
Content-Type: text/plain; charset=ISO-8859-1

I use Redis to stick docs into RAM. Once they're in RAM, I like to use node
to parse the docs in the way I want them, then purge the dataset. Couchout
pulls them into RAM using Redis, couchin bulk_saves back into couchdb from
Redis. I tried to make the couchout/in tools language agnostic.

Anyway, you can certainly use whatever language you want and load all of
the docs into memory. Typically though if you're dealing with a non
statically compiled language, you're going to run into situations where
Redis would be more efficient.


On Wed, May 15, 2013 at 2:17 AM, James Marca <jmarca@translab.its.uci.edu>wrote:

> On Mon, May 13, 2013 at 02:24:50AM -0400, Lance Carlson wrote:
> > Oops, urls:
> >
> > https://github.com/lancecarlson/couchin.go
> > https://github.com/lancecarlson/couchout.go
> >
> > Feedback appreciated!
> >
>
> I don't understand the use case here, so I'd appreciate an example.
> If you can define a view or use all_docs to pull docs from couch and
> into redis, why use redis at all?  Why not just use couch directly,
> load docs into ram, and process them?
>
> I feel like I'm missing something obvious.
>
> Also, I've never stressed Redis much.  What happens when you bump up
> against ram limits?
>
> James
> >
> > On Mon, May 13, 2013 at 2:24 AM, Lance Carlson <lancecarlson@gmail.com
> >wrote:
> >
> > > Made a lot of updates to my couchout project. It now includes a couchin
> > > project as well. Might create another project for updating, but it's
> pretty
> > > easy for someone to script a node js script (or any language for that
> > > matter) that connects to redis, decodes and encodes base64.
> > >
> > >
> > > On Sat, May 11, 2013 at 2:27 AM, Andrey Kuprianov <
> > > andrey.kouprianov@gmail.com> wrote:
> > >
> > >> We do that and we have a cron to touch view every 5 min. Its just
> that at
> > >> that particular time we had to insert those 150k in one go (we were
> > >> migrating from mysql)
> > >>
> > >> Sent from my iPhone
> > >>
> > >> On 11 May, 2013, at 1:02 PM, Benoit Chesneau <bchesneau@gmail.com>
> wrote:
> > >>
> > >> > On May 9, 2013 1:17 PM, "Andrey Kuprianov" <
> andrey.kouprianov@gmail.com
> > >> >
> > >> > wrote:
> > >> >>
> > >> >> Rebuilding the views mentioned by James is hell! And the more docs
> and
> > >> >> views you have, the longer your views will have to catch up with
> the
> > >> >> updates. We dont have the best of the servers, but ours (dedicated)
> > >> took
> > >> >> several hours to rebuild our views (not too many as well) after we
> > >> > inserted
> > >> >> ~150k documents (we use full text search with Lucene as well, so it
> > >> also
> > >> >> contributed to the overall sever slowdown).
> > >> >>
> > >> >> So my suggestion is:
> > >> >>
> > >> >> 1. Once you want to migrate your stuff, make a copy of your db.
> > >> >> 2. Do migration on the copy
> > >> >> 3. Allow for views to rebuild (you need to query each desing's
> document
> > >> >> single view once to trigger for views to start catching up with the
> > >> >> updates). You'd probably ask, if it was possible to limit resource
> > >> usage
> > >> > of
> > >> >> Couch, when views are rebuilding, but i dont have answer to that
> > >> question.
> > >> >> Maybe someone else can help here...
> > >> >> 4. Switch database pointer from one DB to another.
> > >> >
> > >> > You don' t need to wait that all the docs are here to triggerthe
> > >> viewupdat,
> > >> > Jus trigger it more often. So view calculation will happen on
> smaller
> > >> set.
> > >> >
> > >> > You caneven make it //by using different ddocs.
> > >> >>
> > >> >>
> > >> >> On Thu, May 9, 2013 at 1:41 PM, Paul Davis <
> > >> paul.joseph.davis@gmail.com
> > >> >> wrote:
> > >> >>
> > >> >>> On Wed, May 8, 2013 at 10:24 PM, Charles S. Koppelman-Milstein
> > >> >>> <ckoppel@alumni.gwu.edu> wrote:
> > >> >>>> I am trying to understand whether Couch is the way to go to meet
> some
> > >> > of
> > >> >>>> my organization's needs.  It seems pretty terrific.
> > >> >>>> The main concern I have is maintaining a consistent state across
> code
> > >> >>>> releases.  Presumably, our data model will change over the
> course of
> > >> >>>> time, and when it does, we need to make the several million old
> > >> >>>> documents conform to the new model.
> > >> >>>>
> > >> >>>> Although I would love to pipe a view through an update handler
> and
> > >> > call
> > >> >>>> it a day, I don't believe that option exists.  The two ways I
> > >> >>>> understandto do this are:
> > >> >>>>
> > >> >>>> 1. Query all documents, update each doc client-side, and PUT
> those
> > >> >>>> changes in the _bulk_docs API (presumably this should be done in
> > >> > batches)
> > >> >>>> 2. Query the ids for all docs, and one at a time, PUT them
> through an
> > >> >>>> update handler
> > >> >>>
> > >> >>> You are correct that there's no server side way to do a migration
> like
> > >> >>> you're asking for server side.
> > >> >>>
> > >> >>> The general pattern for these things is to write a view that only
> > >> >>> includes the documents that need to be changed and then write
> > >> >>> something that goes through and processes each doc in the view to
> the
> > >> >>> desired form (that removes it from the view). This way you can
> easily
> > >> >>> know when you're done working. Its definitely possible to write
> > >> >>> something that stores state and/or just brute force a db scan each
> > >> >>> time you write run the migration.
> > >> >>>
> > >> >>> Performance wise, your first suggestion would probably be the most
> > >> >>> performant although depending on document sizes and latencies it
> may
> > >> >>> be possible to get better numbers using an update handler but I
> doubt
> > >> >>> it unless you have huge docs and a super slow connection with high
> > >> >>> latencies.
> > >> >>>
> > >> >>>> Are these options reasonably performant?  If we have to do a
> > >> > mass-update
> > >> >>>> once a deployment, it's not terrible if it's not
> lightning-speed, but
> > >> > it
> > >> >>>> shouldn't take terribly long.  Also, I have read that update
> handlers
> > >> >>>> have indexes built against them.  If this is a fire-once option,
> is
> > >> > that
> > >> >>>> worthwhile?
> > >> >>>
> > >> >>> I'm not sure what you mean that update handlers have indexes built
> > >> >>> against them. That doesn't match anything that currently exist in
> > >> >>> CouchDB.
> > >> >>>
> > >> >>>> Which option is better?  Is there an even better way?
> > >> >>>
> > >> >>> There's nothing better than you're general ideas listed.
> > >> >>>
> > >> >>>> Thanks,
> > >> >>>> Charles
> > >> >>>
> > >>
> > >
> > >
>
>

--089e013cb81a81d20e04dcbbd460--