couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ingo Radatz <>
Subject connection handling in long running bulk doc uploads
Date Mon, 16 Mar 2015 10:50:09 GMT

My use case is to upload around 100k JSON documents (500Mb) to the CouchDB via a PUT to the
_bulk_doc handler. Everything works well and because some schema validation is involved the
upload time of an hour was not surprising. 

Unfortunately it turns out that some proxy servers (e.g. squid) on the sender site cancel
the connection (a default timeout config param was reached) while they should wait for the
response of the _bulk_doc handler. Because to change the config of foreign proxy servers is
theoretically but not practically possible i'm looking for a solution that lets such proxies
know that the connection is healthy and should not be canceled. 

- Maybe a streamed approach for the up- and download phases? 
- Maybe a heartbeat?
- Any other idea?

A quick and dirty solution is done - uploads are made now in smaller batch sizes. That cannot
be the final solution because it depends on so many variables that it stays just a hope "to
finish until some timeout will hit".

Here a little repetition of the failed upload process:

1. send 10k docs as one JSON payload to the _bulk_doc
2. wait a time longer then the default timeout in the proxy (e.g. 4 minutes) for the response
3. get disconnected by the proxy without a response from CouchDB

Best, ingo

View raw message