On Thu, Jan 14, 2010 at 02:15:54PM -0800, Chris Anderson wrote: > > More difficult would be to allow bulk *updates* via this mechanism, because > > having parsed out the IDs you'd need to be able to fetch existing docs, > > modify and write back. > > > > If the CSV source was responsible for tracking _revs then it could work easily. What I mean is, couchdb itself can't parse out the _id and _rev as the stream comes in (since the CSV parsing isn't built into couchdb), so it can't pre-fetch the docs. The doc fetch requests would have to be bounced back to couchdb core. e.g. data over HTTP opaque data -------------> couchdb ----------------> updater function _ids and _revs <---------------- original docs ----------------> updated docs <---------------- But if we allow streaming that's going to be awkward; the 'opaque data' stream may have to be interleaved with the 'original docs' stream. Then after updating the docs, what is couchdb going to do with the results of each save, i.e. success/fail and new _revs? It could send them back to the client in JSON format like the result of a _bulk_save, but that won't mean much to must users. So you probably also want: save statuses ----------------> response stream or HTML status page <---------------- If you want to stream all this, and you don't want couchjs functions to be able to make asynchronous callbacks to couchdb, you could run three separate couchjs processes in parallel: data over HTTP opaque data -------------> couchdb ----------------> parser function _ids,_revs and updates <---------------- JSON docs+updates ----------------> updater function updated docs <---------------- doc statuses ----------------> results list function opaque data <---------------- Maybe there's a way to do this multipass load using some sort of staging docs in the database itself. Imagine saving '_bulk_docs' requests and responses as docs themselves, then spooling them out using a list function. It could be simpler without streaming: -------> blob <------- _all_docs request -------> _all_docs response <------- _bulk_save request -------> _bulk_save response <------- blob That would let you import 10MB of data via a couchapp, but for 10GB you'd need a custom app in front.