incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Bulk Updates in CouchDB
Date Tue, 16 Nov 2010 10:52:16 GMT

On 16 Nov 2010, at 11:48, Jan Lehnardt wrote:

> Hi Neville,
> 
> On 16 Nov 2010, at 10:44, Neville Franks wrote:
> 
>> Thanks for the prompt response. I have to say that I am very, very
>> surprised that what seems to me are such basic operations aren't
>> available natively within CouchDB.
> 
> It is less that this is a basic operation that isn't supported and
> more shows the difference in philosophy between CouchDB and, say,
> SQLite.
> 
> 
>> This is probably a deal breaker for my use and I would have thought
>> many others. My concern is iterating over a large number of documents
>> on a remote server just to do simple updates. It means I need to do
>> several HTTP requests (GET/PUT/DELETE) for each document in a set of
>> of possibly thousands or tens of thousands. I'm in Australia and the
>> server is in the US and I would imagine this making an application
>> unusable.
> 
> A couple of thoughts:
> 
> - How often does that run? — Of course, the operation will be slower
>   than telling the server to update a bunch of fields*, but if it is
>   rare occurrence, it may not be that big a deal.
> 
>    * CouchDB doesn't have a notion of "fields", hence this operation
>      proves a little tricky.
> 
> - CouchDB could handle the bulk updating for you, but it'd essentially
>   do the same things you'd do, expect the HTTP overhead. If you set it

                                ^^^^^^-except.


>   up smartly, you create a view to fetch all documents you want to edit,
>   update them and then send back a bulk request with all change requests
>   back to CouchDB.
> 
>   Again, yes, you could save time transferring that data back and forth
>   but the main cost of the operation is more likely disk I/O that will
>   happen regardless.
> 
> - One mode of operation for CouchDB is distributed, offline. You could
>   have a CouchDB instance locally in Australia and make all your changes
>   there in a low-latency situation (but then, you'd probably only two
>   requests for 10-100k documents) and later replicate your results to the
>   US.
> 
> - Even if CouchDB were to support bulk editing on the server (I think
>   it would be a great addition), it wouldn't guarantee any transaction
>   semantics. (You didn't name that specifically, but it usually comes up
>   quickly in these discussions.) This means that while the update operation
>   is in progress, other clients could possibly see some documents in the
>   pre and some in the post-state and you app needs to be OK with that.
> 
> 
>> I am getting the feeling that CouchDB is great for storing lots of
>> information and getting it back in lots of interesting ways but not a
>> good fit for typical CRUD stuff that's done in SQL all the time.
>> Please correct me if I'm wrong.
> 
> It is plenty good for CRUD operations. Except for the case where you
> want to emulate `UPDATE foo SET bar="baz" WHERE qux="quux";`.
> 
> The question then is how frequent "all the time" is. I know I've done
> my share of bulk updates in SQL land, but the apps I build rarely use
> that feature as one of the things they do all the time.
> 
> I can see that background processes and cronjobs may have more use for
> that particular feature.
> 
> --
> 
> Come to think of it, I think I'll explore my old idea of "compaction
> with a transformation function" again :)
> 
> Cheers
> Jan
> -- 
> 
> 
> 
>> 
>> 
>> Tuesday, November 16, 2010, 2:06:18 PM, you wrote:
>> 
>> k> You can't do this at the moment. At least, not that I know of.
>> 
>> k> For #1, my current trick is to generate a view with a map function
>> k> that looks like this:
>> 
>> k> map: function(doc) {
>> k>     emit(null, doc._rev);
>> k> }
>> 
>> k> This makes it easy to convert the results of the view (without
>> k> stale=ok) into a bulk-delete by iterating over the view, but not
>> k> fetching each document to get at the revision. This does mean that the
>> k> documents haven't been updated between the view-fetch and the
>> k> bulk-delete. But for my use-case, this works.
>> 
>> k> For #2, you have to iterate over from the client side. Unless, someone
>> k> else has another idea.
>> 
>> k> K.
>> k> ---
>> k> http://www.pcapr.net
>> k> http://twitter.com/pcapr
>> k> http://labs.mudynamics.com
>> 
>> k> On Mon, Nov 15, 2010 at 7:01 PM, Neville Franks <subs@surfulater.com>
wrote:
>>>> Hi,
>>>> I am just learning about CouchDB so please excuse this nooby question.
>>>> 
>>>> I've read lots the past few days including
>>>> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API and a fair bit
>>>> of two of the online CouchDB books.
>>>> 
>>>> My question is how do I do some simple things like:
>>>> 
>>>> 1) Delete all documents where key.value = xxx
>>>> 
>>>> 2) Update all documents where key.value = xxx so value = yyy
>>>> 
>>>> I want the DB to do these, not for me to have to iterate through the
>>>> DB in code. From what I've read about Views, they are read-only and
>>>> therefore can't be used in update/delete operations.
>>>> 
>>>> I've read lots on views and CouchDB seems great at getting information
>>>> out in all sorts of ways, however basic bulk update/delete operations
>>>> are so far alluding me.
>>>> 
>>>> My main exposure to DB's at this time is using SQLite and these sorts
>>>> of things are of course easy and quick to do in SQL.
>>>> 
>>>> Hopefully I'm missing something obvious.
>>>> 
>>>> ---
>>>> Neville Franks,  http://www.surfulater.com
>> 
>> Neville Franks,  http://www.surfulater.com
>> 
> 


Mime
View raw message