incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Bulk Updates in CouchDB
Date Tue, 16 Nov 2010 10:48:21 GMT
Hi Neville,

On 16 Nov 2010, at 10:44, Neville Franks wrote:

> Thanks for the prompt response. I have to say that I am very, very
> surprised that what seems to me are such basic operations aren't
> available natively within CouchDB.

It is less that this is a basic operation that isn't supported and
more shows the difference in philosophy between CouchDB and, say,
SQLite.


> This is probably a deal breaker for my use and I would have thought
> many others. My concern is iterating over a large number of documents
> on a remote server just to do simple updates. It means I need to do
> several HTTP requests (GET/PUT/DELETE) for each document in a set of
> of possibly thousands or tens of thousands. I'm in Australia and the
> server is in the US and I would imagine this making an application
> unusable.

A couple of thoughts:

 - How often does that run? — Of course, the operation will be slower
   than telling the server to update a bunch of fields*, but if it is
   rare occurrence, it may not be that big a deal.

    * CouchDB doesn't have a notion of "fields", hence this operation
      proves a little tricky.

 - CouchDB could handle the bulk updating for you, but it'd essentially
   do the same things you'd do, expect the HTTP overhead. If you set it
   up smartly, you create a view to fetch all documents you want to edit,
   update them and then send back a bulk request with all change requests
   back to CouchDB.

   Again, yes, you could save time transferring that data back and forth
   but the main cost of the operation is more likely disk I/O that will
   happen regardless.

 - One mode of operation for CouchDB is distributed, offline. You could
   have a CouchDB instance locally in Australia and make all your changes
   there in a low-latency situation (but then, you'd probably only two
   requests for 10-100k documents) and later replicate your results to the
   US.

 - Even if CouchDB were to support bulk editing on the server (I think
   it would be a great addition), it wouldn't guarantee any transaction
   semantics. (You didn't name that specifically, but it usually comes up
   quickly in these discussions.) This means that while the update operation
   is in progress, other clients could possibly see some documents in the
   pre and some in the post-state and you app needs to be OK with that.


> I am getting the feeling that CouchDB is great for storing lots of
> information and getting it back in lots of interesting ways but not a
> good fit for typical CRUD stuff that's done in SQL all the time.
> Please correct me if I'm wrong.

It is plenty good for CRUD operations. Except for the case where you
want to emulate `UPDATE foo SET bar="baz" WHERE qux="quux";`.

The question then is how frequent "all the time" is. I know I've done
my share of bulk updates in SQL land, but the apps I build rarely use
that feature as one of the things they do all the time.

I can see that background processes and cronjobs may have more use for
that particular feature.

--

Come to think of it, I think I'll explore my old idea of "compaction
with a transformation function" again :)

Cheers
Jan
-- 



> 
> 
> Tuesday, November 16, 2010, 2:06:18 PM, you wrote:
> 
> k> You can't do this at the moment. At least, not that I know of.
> 
> k> For #1, my current trick is to generate a view with a map function
> k> that looks like this:
> 
> k> map: function(doc) {
> k>     emit(null, doc._rev);
> k> }
> 
> k> This makes it easy to convert the results of the view (without
> k> stale=ok) into a bulk-delete by iterating over the view, but not
> k> fetching each document to get at the revision. This does mean that the
> k> documents haven't been updated between the view-fetch and the
> k> bulk-delete. But for my use-case, this works.
> 
> k> For #2, you have to iterate over from the client side. Unless, someone
> k> else has another idea.
> 
> k> K.
> k> ---
> k> http://www.pcapr.net
> k> http://twitter.com/pcapr
> k> http://labs.mudynamics.com
> 
> k> On Mon, Nov 15, 2010 at 7:01 PM, Neville Franks <subs@surfulater.com> wrote:
>>> Hi,
>>> I am just learning about CouchDB so please excuse this nooby question.
>>> 
>>> I've read lots the past few days including
>>> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API and a fair bit
>>> of two of the online CouchDB books.
>>> 
>>> My question is how do I do some simple things like:
>>> 
>>> 1) Delete all documents where key.value = xxx
>>> 
>>> 2) Update all documents where key.value = xxx so value = yyy
>>> 
>>> I want the DB to do these, not for me to have to iterate through the
>>> DB in code. From what I've read about Views, they are read-only and
>>> therefore can't be used in update/delete operations.
>>> 
>>> I've read lots on views and CouchDB seems great at getting information
>>> out in all sorts of ways, however basic bulk update/delete operations
>>> are so far alluding me.
>>> 
>>> My main exposure to DB's at this time is using SQLite and these sorts
>>> of things are of course easy and quick to do in SQL.
>>> 
>>> Hopefully I'm missing something obvious.
>>> 
>>> ---
>>> Neville Franks,  http://www.surfulater.com
> 
> Neville Franks,  http://www.surfulater.com
> 


Mime
View raw message