incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Carlson <lancecarl...@gmail.com>
Subject Re: Mass updates
Date Wed, 15 May 2013 06:38:21 GMT
There are other benefits to having your dataset in Redis rather than RAM
BTW.. for one, it's easier to run multiple processes against your dataset
to split up the work of manipulating the data.

Anyway, I've not really stressed the upper bounds of Redis RAM limits. Our
largest datasets only use a total of 150MB's on Redis so it hasn't been a
big deal yet.


On Wed, May 15, 2013 at 2:26 AM, Lance Carlson <lancecarlson@gmail.com>wrote:

> I use Redis to stick docs into RAM. Once they're in RAM, I like to use
> node to parse the docs in the way I want them, then purge the dataset.
> Couchout pulls them into RAM using Redis, couchin bulk_saves back into
> couchdb from Redis. I tried to make the couchout/in tools language agnostic.
>
> Anyway, you can certainly use whatever language you want and load all of
> the docs into memory. Typically though if you're dealing with a non
> statically compiled language, you're going to run into situations where
> Redis would be more efficient.
>
>
> On Wed, May 15, 2013 at 2:17 AM, James Marca <jmarca@translab.its.uci.edu>wrote:
>
>> On Mon, May 13, 2013 at 02:24:50AM -0400, Lance Carlson wrote:
>> > Oops, urls:
>> >
>> > https://github.com/lancecarlson/couchin.go
>> > https://github.com/lancecarlson/couchout.go
>> >
>> > Feedback appreciated!
>> >
>>
>> I don't understand the use case here, so I'd appreciate an example.
>> If you can define a view or use all_docs to pull docs from couch and
>> into redis, why use redis at all?  Why not just use couch directly,
>> load docs into ram, and process them?
>>
>> I feel like I'm missing something obvious.
>>
>> Also, I've never stressed Redis much.  What happens when you bump up
>> against ram limits?
>>
>> James
>> >
>> > On Mon, May 13, 2013 at 2:24 AM, Lance Carlson <lancecarlson@gmail.com
>> >wrote:
>> >
>> > > Made a lot of updates to my couchout project. It now includes a
>> couchin
>> > > project as well. Might create another project for updating, but it's
>> pretty
>> > > easy for someone to script a node js script (or any language for that
>> > > matter) that connects to redis, decodes and encodes base64.
>> > >
>> > >
>> > > On Sat, May 11, 2013 at 2:27 AM, Andrey Kuprianov <
>> > > andrey.kouprianov@gmail.com> wrote:
>> > >
>> > >> We do that and we have a cron to touch view every 5 min. Its just
>> that at
>> > >> that particular time we had to insert those 150k in one go (we were
>> > >> migrating from mysql)
>> > >>
>> > >> Sent from my iPhone
>> > >>
>> > >> On 11 May, 2013, at 1:02 PM, Benoit Chesneau <bchesneau@gmail.com>
>> wrote:
>> > >>
>> > >> > On May 9, 2013 1:17 PM, "Andrey Kuprianov" <
>> andrey.kouprianov@gmail.com
>> > >> >
>> > >> > wrote:
>> > >> >>
>> > >> >> Rebuilding the views mentioned by James is hell! And the more
>> docs and
>> > >> >> views you have, the longer your views will have to catch up
with
>> the
>> > >> >> updates. We dont have the best of the servers, but ours
>> (dedicated)
>> > >> took
>> > >> >> several hours to rebuild our views (not too many as well)
after we
>> > >> > inserted
>> > >> >> ~150k documents (we use full text search with Lucene as well,
so
>> it
>> > >> also
>> > >> >> contributed to the overall sever slowdown).
>> > >> >>
>> > >> >> So my suggestion is:
>> > >> >>
>> > >> >> 1. Once you want to migrate your stuff, make a copy of your
db.
>> > >> >> 2. Do migration on the copy
>> > >> >> 3. Allow for views to rebuild (you need to query each desing's
>> document
>> > >> >> single view once to trigger for views to start catching up
with
>> the
>> > >> >> updates). You'd probably ask, if it was possible to limit
resource
>> > >> usage
>> > >> > of
>> > >> >> Couch, when views are rebuilding, but i dont have answer to
that
>> > >> question.
>> > >> >> Maybe someone else can help here...
>> > >> >> 4. Switch database pointer from one DB to another.
>> > >> >
>> > >> > You don' t need to wait that all the docs are here to triggerthe
>> > >> viewupdat,
>> > >> > Jus trigger it more often. So view calculation will happen on
>> smaller
>> > >> set.
>> > >> >
>> > >> > You caneven make it //by using different ddocs.
>> > >> >>
>> > >> >>
>> > >> >> On Thu, May 9, 2013 at 1:41 PM, Paul Davis <
>> > >> paul.joseph.davis@gmail.com
>> > >> >> wrote:
>> > >> >>
>> > >> >>> On Wed, May 8, 2013 at 10:24 PM, Charles S. Koppelman-Milstein
>> > >> >>> <ckoppel@alumni.gwu.edu> wrote:
>> > >> >>>> I am trying to understand whether Couch is the way
to go to
>> meet some
>> > >> > of
>> > >> >>>> my organization's needs.  It seems pretty terrific.
>> > >> >>>> The main concern I have is maintaining a consistent
state
>> across code
>> > >> >>>> releases.  Presumably, our data model will change
over the
>> course of
>> > >> >>>> time, and when it does, we need to make the several
million old
>> > >> >>>> documents conform to the new model.
>> > >> >>>>
>> > >> >>>> Although I would love to pipe a view through an update
handler
>> and
>> > >> > call
>> > >> >>>> it a day, I don't believe that option exists.  The
two ways I
>> > >> >>>> understandto do this are:
>> > >> >>>>
>> > >> >>>> 1. Query all documents, update each doc client-side,
and PUT
>> those
>> > >> >>>> changes in the _bulk_docs API (presumably this should
be done in
>> > >> > batches)
>> > >> >>>> 2. Query the ids for all docs, and one at a time,
PUT them
>> through an
>> > >> >>>> update handler
>> > >> >>>
>> > >> >>> You are correct that there's no server side way to do
a
>> migration like
>> > >> >>> you're asking for server side.
>> > >> >>>
>> > >> >>> The general pattern for these things is to write a view
that only
>> > >> >>> includes the documents that need to be changed and then
write
>> > >> >>> something that goes through and processes each doc in
the view
>> to the
>> > >> >>> desired form (that removes it from the view). This way
you can
>> easily
>> > >> >>> know when you're done working. Its definitely possible
to write
>> > >> >>> something that stores state and/or just brute force a
db scan
>> each
>> > >> >>> time you write run the migration.
>> > >> >>>
>> > >> >>> Performance wise, your first suggestion would probably
be the
>> most
>> > >> >>> performant although depending on document sizes and latencies
it
>> may
>> > >> >>> be possible to get better numbers using an update handler
but I
>> doubt
>> > >> >>> it unless you have huge docs and a super slow connection
with
>> high
>> > >> >>> latencies.
>> > >> >>>
>> > >> >>>> Are these options reasonably performant?  If we have
to do a
>> > >> > mass-update
>> > >> >>>> once a deployment, it's not terrible if it's not
>> lightning-speed, but
>> > >> > it
>> > >> >>>> shouldn't take terribly long.  Also, I have read that
update
>> handlers
>> > >> >>>> have indexes built against them.  If this is a fire-once
>> option, is
>> > >> > that
>> > >> >>>> worthwhile?
>> > >> >>>
>> > >> >>> I'm not sure what you mean that update handlers have indexes
>> built
>> > >> >>> against them. That doesn't match anything that currently
exist in
>> > >> >>> CouchDB.
>> > >> >>>
>> > >> >>>> Which option is better?  Is there an even better way?
>> > >> >>>
>> > >> >>> There's nothing better than you're general ideas listed.
>> > >> >>>
>> > >> >>>> Thanks,
>> > >> >>>> Charles
>> > >> >>>
>> > >>
>> > >
>> > >
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message