couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Bulk Updates in CouchDB
Date Wed, 17 Nov 2010 09:56:38 GMT
Hi Neville,

The CouchDB Book has a chapter on solving common tasks in CouchDB if you
have an RDBMS background:

  http://guide.couchdb.org/editions/1/en/cookbook.html

It doesn't cover your case, but I intend to add it.

Cheers
Jan
-- 

On 16 Nov 2010, at 23:02, Neville Franks wrote:

> Hi Jan,
> Thanks for taking the time to respond in detail. I imagine most people
> coming for SQL'land will face various brick walls while trying to
> learn the new paradigm's of Document Oriented DB's.
> 
> I think it is time for me to stop reading and dig my heels in with a
> "proof of concept" sample app. No doubt this will be challenging,
> however I'm sure I'll learn a lot. Hopefully the batch update methods
> you discuss will be satisfactory both from a coding and performance
> perspective.
> 
> I'm heartened to know that someone else feels that having bulk
> editing on the server is a great idea and not some newby stupid
> comment on my part.
> 
> My overriding interest in CouchDB is its replication capabilities and
> offline/online use case. I have not found any other database that does
> this so easily and hopefully effectively as CouchDB. My plan was to
> implement my own replication capability using SQLite, which I already
> use, however this is a complex task, one which I'll happily leave to
> others.
> 
> I'm sure more questions will follow. The SQLite community is very
> active and helpful, and from what I've seen, so is CouchDB.
> 
> Tuesday, November 16, 2010, 9:48:21 PM, you wrote:
> 
> JL> Hi Neville,
> 
> JL> On 16 Nov 2010, at 10:44, Neville Franks wrote:
> 
>>> Thanks for the prompt response. I have to say that I am very, very
>>> surprised that what seems to me are such basic operations aren't
>>> available natively within CouchDB.
> 
> JL> It is less that this is a basic operation that isn't supported and
> JL> more shows the difference in philosophy between CouchDB and, say,
> JL> SQLite.
> 
> 
>>> This is probably a deal breaker for my use and I would have thought
>>> many others. My concern is iterating over a large number of documents
>>> on a remote server just to do simple updates. It means I need to do
>>> several HTTP requests (GET/PUT/DELETE) for each document in a set of
>>> of possibly thousands or tens of thousands. I'm in Australia and the
>>> server is in the US and I would imagine this making an application
>>> unusable.
> 
> JL> A couple of thoughts:
> 
> JL>  - How often does that run? — Of course, the operation will be slower
> JL>    than telling the server to update a bunch of fields*, but if it is
> JL>    rare occurrence, it may not be that big a deal.
> 
> JL>     * CouchDB doesn't have a notion of "fields", hence this operation
> JL>       proves a little tricky.
> 
> JL>  - CouchDB could handle the bulk updating for you, but it'd essentially
> JL>    do the same things you'd do, expect the HTTP overhead. If you set it
> JL>    up smartly, you create a view to fetch all documents you want to edit,
> JL>    update them and then send back a bulk request with all change requests
> JL>    back to CouchDB.
> 
> JL>    Again, yes, you could save time transferring that data back and forth
> JL>    but the main cost of the operation is more likely disk I/O that will
> JL>    happen regardless.
> 
> JL>  - One mode of operation for CouchDB is distributed, offline. You could
> JL>    have a CouchDB instance locally in Australia and make all your changes
> JL>    there in a low-latency situation (but then, you'd probably only two
> JL>    requests for 10-100k documents) and later replicate your results to the
> JL>    US.
> 
> JL>  - Even if CouchDB were to support bulk editing on the server (I think
> JL>    it would be a great addition), it wouldn't guarantee any transaction
> JL>    semantics. (You didn't name that specifically, but it usually comes up
> JL>    quickly in these discussions.) This means that while the update operation
> JL>    is in progress, other clients could possibly see some documents in the
> JL>    pre and some in the post-state and you app needs to be OK with that.
> 
> 
>>> I am getting the feeling that CouchDB is great for storing lots of
>>> information and getting it back in lots of interesting ways but not a
>>> good fit for typical CRUD stuff that's done in SQL all the time.
>>> Please correct me if I'm wrong.
> 
> JL> It is plenty good for CRUD operations. Except for the case where you
> JL> want to emulate `UPDATE foo SET bar="baz" WHERE qux="quux";`.
> 
> JL> The question then is how frequent "all the time" is. I know I've done
> JL> my share of bulk updates in SQL land, but the apps I build rarely use
> JL> that feature as one of the things they do all the time.
> 
> JL> I can see that background processes and cronjobs may have more use for
> JL> that particular feature.
> 
> JL> --
> 
> JL> Come to think of it, I think I'll explore my old idea of "compaction
> JL> with a transformation function" again :)
> 
> JL> Cheers
> JL> Jan
> 
> 
> --
> Best regards,
>  Neville Franks, Author of Surfulater - Your off-line Digital Reference Library
>  Soft As It Gets Pty Ltd,  http://www.surfulater.com - Download your copy now.
>  Victoria, Australia       Blog: http://blog.surfulater.com 
> 
> 


Mime
View raw message