couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <>
Subject Re: Why does CouchDb need attachment length?
Date Wed, 20 Nov 2013 11:39:59 GMT

On 19 Nov 2013, at 16:36 , Nick North <> wrote:

> I agree - any sensible HTTP client will do chunking or supply a content
> length for the entire request. It's getting lengths of the individual
> attachments and putting them into the initial JSON part of the request
> that's difficult.

That’s the part where we accept patches :)


> Luckily my main requirements are for file, string, or byte array
> attachments, rather than arbitrary streams, and it's easy to get the
> lengths of all of those without having to do stream traversal. So I'm
> leaving the problem alone for a while and also assuming non-chunked
> requests. But I would like to come back to it at some point, as there are
> hints in the Apache HTTP client that it may chunk requests even if you ask
> it not to, so the ability to accept chunked requests seems useful for
> CouchDb.
> Nick
> On 19 November 2013 01:45, Jan Lehnardt <> wrote:
>> On 16 Nov 2013, at 13:31 , Nick North <> wrote:
>>> One more thought before I leave off for the moment. Although this
>> endpoint was built for the replicator, it is very useful for other clients,
>> as it is the only way to submit a document and its attachments in a single
>> action. This is important if you're not allowed to update documents or if
>> you want to guarantee that readers of documents in the database and its
>> replicas never see a partial set of the document and its attachments. This
>> use case suggests to me that the endpoint should be easy to use for
>> everyone, if that can be done without harming replication. But the chunking
>> business means I need to think some more before making a proposal on it.
>> The API should totally work as simple as possible for clients other than
>> the
>> replicator. It just hasn’t been built that way yet and we are happy to
>> accept
>> patches :) — The mention that is was custom built for the replicator is
>> just
>> to explain the current limitations.
>> That said, I think you either need a length OR chunking, but any self
>> respecting
>> HTTP client should make that trivial for you as the end user :)
>> Best
>> Jan
>> --
>>> Nick
>>>> On 16 Nov 2013, at 18:57, Robert Newson <> wrote:
>>>> Ah, no. Http requires either content length or a chunked encoding. We
>> could
>>>> certainly enhance this. My point was that this endpoint was built for
>> the
>>>> replicator.
>>>>> On 16 Nov 2013 18:54, "Nick North" <> wrote:
>>>>> Thanks for the quick reply. I see what you're saying, though it still
>>>>> seems to me that CouchDb could accept incoming non-chunked requests
>> where
>>>>> individual attachments do not have their lengths specified. They could
>> be
>>>>> calculated on receipt and kept for use in replication. That would make
>> use
>>>>> of client libraries like the Apache Java HttpClient easier. But maybe
>> my
>>>>> lack of detailed knowledge of HTTP is showing.
>>>>> Nick
>>>>>> On 16 Nov 2013, at 18:24, Robert Newson <>
>>>>>> Because we haven't written the code to handle multipart/related
>>>>>> responses where each item is also a chunked response, and we haven't
>>>>>> done that because the replicator could always form a non-chunked
>>>>>> request since it already knows the sizes.
>>>>>> B.
>>>>>>> On 16 November 2013 18:11, Nick North <>
>>>>>>> I'm working with CouchDb documents with multiple attachments,
>> submitted
>>>>>>> using MIME multipart/related requests. In this case the document
>>>>> has
>>>>>>> to have an "_attachments" property specifying each attachment's
>>>>>>> content type and length as described
>>>>>>> here<
>>> .
>>>>>>> The document and attachments are MIME-encoded and submitted in
>> single
>>>>>>> request.
>>>>>>> Although this works, programming it is awkward as each attachment's
>>>>> length
>>>>>>> must be known in advance in order to populate the _attachments
>> property.
>>>>>>> Attachments are often in the form of streams, and finding the
>>>>> means
>>>>>>> having to read through the whole stream. Then you have to spool
>> through
>>>>> the
>>>>>>> stream again when submitting the HTTP request. (In some languages
>>>>> suspect
>>>>>>> the only way to do this is to buffer the entire stream contents
>>>>> memory.)
>>>>>>> If the length did not have to be put into the initial JSON object,
>> then
>>>>> the
>>>>>>> stream could just be passed straight through to the HTTP request
>> with no
>>>>>>> need for reading twice or buffering in memory.
>>>>>>> So my question is: why does CouchDb require the length to be
>> supplied?
>>>>> It's
>>>>>>> definitely necessary as I've tried giving the wrong length, or
>>>>> length at
>>>>>>> all, and that causes the request to fail. But a quick look at
>> Erlang
>>>>>>> source suggests that the length is not used when parsing the
>>>>> and
>>>>>>> presumably that parsing process could calculate each attachment's
>> length
>>>>>>> for use later on if it's needed.
>>>>>>> If, in principle, the length could be dropped when submitting
>> requests,
>>>>>>> then I'd be interested in trying to modify the code to make that
>>>>> possible.
>>>>>>> But, if there is a good reason why it has to be supplied, then
>> don't
>>>>> want
>>>>>>> to waste time working out what's going on in the Erlang. So any
>> advice
>>>>> on
>>>>>>> why attachments were designed as they are would be very welcome.
>>>>>>> thanks,
>>>>>>> Nick

View raw message