couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Why does CouchDb need attachment length?
Date Wed, 20 Nov 2013 11:39:59 GMT

On 19 Nov 2013, at 16:36 , Nick North <north.n@gmail.com> wrote:

> I agree - any sensible HTTP client will do chunking or supply a content
> length for the entire request. It's getting lengths of the individual
> attachments and putting them into the initial JSON part of the request
> that's difficult.

That’s the part where we accept patches :)

Best
Jan
--


> 
> Luckily my main requirements are for file, string, or byte array
> attachments, rather than arbitrary streams, and it's easy to get the
> lengths of all of those without having to do stream traversal. So I'm
> leaving the problem alone for a while and also assuming non-chunked
> requests. But I would like to come back to it at some point, as there are
> hints in the Apache HTTP client that it may chunk requests even if you ask
> it not to, so the ability to accept chunked requests seems useful for
> CouchDb.
> 
> Nick
> 
> 
> On 19 November 2013 01:45, Jan Lehnardt <jan@apache.org> wrote:
> 
>> 
>> On 16 Nov 2013, at 13:31 , Nick North <north.n@gmail.com> wrote:
>> 
>>> One more thought before I leave off for the moment. Although this
>> endpoint was built for the replicator, it is very useful for other clients,
>> as it is the only way to submit a document and its attachments in a single
>> action. This is important if you're not allowed to update documents or if
>> you want to guarantee that readers of documents in the database and its
>> replicas never see a partial set of the document and its attachments. This
>> use case suggests to me that the endpoint should be easy to use for
>> everyone, if that can be done without harming replication. But the chunking
>> business means I need to think some more before making a proposal on it.
>> 
>> The API should totally work as simple as possible for clients other than
>> the
>> replicator. It just hasn’t been built that way yet and we are happy to
>> accept
>> patches :) — The mention that is was custom built for the replicator is
>> just
>> to explain the current limitations.
>> 
>> That said, I think you either need a length OR chunking, but any self
>> respecting
>> HTTP client should make that trivial for you as the end user :)
>> 
>> Best
>> Jan
>> --
>> 
>> 
>>> 
>>> Nick
>>> 
>>>> On 16 Nov 2013, at 18:57, Robert Newson <rnewson@apache.org> wrote:
>>>> 
>>>> Ah, no. Http requires either content length or a chunked encoding. We
>> could
>>>> certainly enhance this. My point was that this endpoint was built for
>> the
>>>> replicator.
>>>>> On 16 Nov 2013 18:54, "Nick North" <north.n@gmail.com> wrote:
>>>>> 
>>>>> Thanks for the quick reply. I see what you're saying, though it still
>>>>> seems to me that CouchDb could accept incoming non-chunked requests
>> where
>>>>> individual attachments do not have their lengths specified. They could
>> be
>>>>> calculated on receipt and kept for use in replication. That would make
>> use
>>>>> of client libraries like the Apache Java HttpClient easier. But maybe
>> my
>>>>> lack of detailed knowledge of HTTP is showing.
>>>>> 
>>>>> Nick
>>>>> 
>>>>>> On 16 Nov 2013, at 18:24, Robert Newson <rnewson@apache.org>
wrote:
>>>>>> 
>>>>>> Because we haven't written the code to handle multipart/related
>>>>>> responses where each item is also a chunked response, and we haven't
>>>>>> done that because the replicator could always form a non-chunked
>>>>>> request since it already knows the sizes.
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>> 
>>>>>>> On 16 November 2013 18:11, Nick North <north.n@gmail.com>
wrote:
>>>>>>> I'm working with CouchDb documents with multiple attachments,
>> submitted
>>>>>>> using MIME multipart/related requests. In this case the document
JSON
>>>>> has
>>>>>>> to have an "_attachments" property specifying each attachment's
name,
>>>>>>> content type and length as described
>>>>>>> here<
>>>>> http://wiki.apache.org/couchdb/HTTP_Document_API#Multiple_Attachments
>>> .
>>>>>>> The document and attachments are MIME-encoded and submitted in
a
>> single
>>>>>>> request.
>>>>>>> 
>>>>>>> Although this works, programming it is awkward as each attachment's
>>>>> length
>>>>>>> must be known in advance in order to populate the _attachments
>> property.
>>>>>>> Attachments are often in the form of streams, and finding the
length
>>>>> means
>>>>>>> having to read through the whole stream. Then you have to spool
>> through
>>>>> the
>>>>>>> stream again when submitting the HTTP request. (In some languages
I
>>>>> suspect
>>>>>>> the only way to do this is to buffer the entire stream contents
in
>>>>> memory.)
>>>>>>> If the length did not have to be put into the initial JSON object,
>> then
>>>>> the
>>>>>>> stream could just be passed straight through to the HTTP request
>> with no
>>>>>>> need for reading twice or buffering in memory.
>>>>>>> 
>>>>>>> So my question is: why does CouchDb require the length to be
>> supplied?
>>>>> It's
>>>>>>> definitely necessary as I've tried giving the wrong length, or
no
>>>>> length at
>>>>>>> all, and that causes the request to fail. But a quick look at
the
>> Erlang
>>>>>>> source suggests that the length is not used when parsing the
request,
>>>>> and
>>>>>>> presumably that parsing process could calculate each attachment's
>> length
>>>>>>> for use later on if it's needed.
>>>>>>> 
>>>>>>> If, in principle, the length could be dropped when submitting
>> requests,
>>>>>>> then I'd be interested in trying to modify the code to make that
>>>>> possible.
>>>>>>> But, if there is a good reason why it has to be supplied, then
I
>> don't
>>>>> want
>>>>>>> to waste time working out what's going on in the Erlang. So any
>> advice
>>>>> on
>>>>>>> why attachments were designed as they are would be very welcome.
Many
>>>>>>> thanks,
>>>>>>> 
>>>>>>> Nick
>>>>> 
>> 
>> 


Mime
View raw message