couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick North <nort...@gmail.com>
Subject Re: Why does CouchDb need attachment length?
Date Tue, 19 Nov 2013 15:36:01 GMT
I agree - any sensible HTTP client will do chunking or supply a content
length for the entire request. It's getting lengths of the individual
attachments and putting them into the initial JSON part of the request
that's difficult.

Luckily my main requirements are for file, string, or byte array
attachments, rather than arbitrary streams, and it's easy to get the
lengths of all of those without having to do stream traversal. So I'm
leaving the problem alone for a while and also assuming non-chunked
requests. But I would like to come back to it at some point, as there are
hints in the Apache HTTP client that it may chunk requests even if you ask
it not to, so the ability to accept chunked requests seems useful for
CouchDb.

Nick


On 19 November 2013 01:45, Jan Lehnardt <jan@apache.org> wrote:

>
> On 16 Nov 2013, at 13:31 , Nick North <north.n@gmail.com> wrote:
>
> > One more thought before I leave off for the moment. Although this
> endpoint was built for the replicator, it is very useful for other clients,
> as it is the only way to submit a document and its attachments in a single
> action. This is important if you're not allowed to update documents or if
> you want to guarantee that readers of documents in the database and its
> replicas never see a partial set of the document and its attachments. This
> use case suggests to me that the endpoint should be easy to use for
> everyone, if that can be done without harming replication. But the chunking
> business means I need to think some more before making a proposal on it.
>
> The API should totally work as simple as possible for clients other than
> the
> replicator. It just hasn’t been built that way yet and we are happy to
> accept
> patches :) — The mention that is was custom built for the replicator is
> just
> to explain the current limitations.
>
> That said, I think you either need a length OR chunking, but any self
> respecting
> HTTP client should make that trivial for you as the end user :)
>
> Best
> Jan
> --
>
>
> >
> > Nick
> >
> >> On 16 Nov 2013, at 18:57, Robert Newson <rnewson@apache.org> wrote:
> >>
> >> Ah, no. Http requires either content length or a chunked encoding. We
> could
> >> certainly enhance this. My point was that this endpoint was built for
> the
> >> replicator.
> >>> On 16 Nov 2013 18:54, "Nick North" <north.n@gmail.com> wrote:
> >>>
> >>> Thanks for the quick reply. I see what you're saying, though it still
> >>> seems to me that CouchDb could accept incoming non-chunked requests
> where
> >>> individual attachments do not have their lengths specified. They could
> be
> >>> calculated on receipt and kept for use in replication. That would make
> use
> >>> of client libraries like the Apache Java HttpClient easier. But maybe
> my
> >>> lack of detailed knowledge of HTTP is showing.
> >>>
> >>> Nick
> >>>
> >>>> On 16 Nov 2013, at 18:24, Robert Newson <rnewson@apache.org> wrote:
> >>>>
> >>>> Because we haven't written the code to handle multipart/related
> >>>> responses where each item is also a chunked response, and we haven't
> >>>> done that because the replicator could always form a non-chunked
> >>>> request since it already knows the sizes.
> >>>>
> >>>> B.
> >>>>
> >>>>
> >>>>> On 16 November 2013 18:11, Nick North <north.n@gmail.com>
wrote:
> >>>>> I'm working with CouchDb documents with multiple attachments,
> submitted
> >>>>> using MIME multipart/related requests. In this case the document
JSON
> >>> has
> >>>>> to have an "_attachments" property specifying each attachment's
name,
> >>>>> content type and length as described
> >>>>> here<
> >>> http://wiki.apache.org/couchdb/HTTP_Document_API#Multiple_Attachments
> >.
> >>>>> The document and attachments are MIME-encoded and submitted in a
> single
> >>>>> request.
> >>>>>
> >>>>> Although this works, programming it is awkward as each attachment's
> >>> length
> >>>>> must be known in advance in order to populate the _attachments
> property.
> >>>>> Attachments are often in the form of streams, and finding the length
> >>> means
> >>>>> having to read through the whole stream. Then you have to spool
> through
> >>> the
> >>>>> stream again when submitting the HTTP request. (In some languages
I
> >>> suspect
> >>>>> the only way to do this is to buffer the entire stream contents
in
> >>> memory.)
> >>>>> If the length did not have to be put into the initial JSON object,
> then
> >>> the
> >>>>> stream could just be passed straight through to the HTTP request
> with no
> >>>>> need for reading twice or buffering in memory.
> >>>>>
> >>>>> So my question is: why does CouchDb require the length to be
> supplied?
> >>> It's
> >>>>> definitely necessary as I've tried giving the wrong length, or no
> >>> length at
> >>>>> all, and that causes the request to fail. But a quick look at the
> Erlang
> >>>>> source suggests that the length is not used when parsing the request,
> >>> and
> >>>>> presumably that parsing process could calculate each attachment's
> length
> >>>>> for use later on if it's needed.
> >>>>>
> >>>>> If, in principle, the length could be dropped when submitting
> requests,
> >>>>> then I'd be interested in trying to modify the code to make that
> >>> possible.
> >>>>> But, if there is a good reason why it has to be supplied, then I
> don't
> >>> want
> >>>>> to waste time working out what's going on in the Erlang. So any
> advice
> >>> on
> >>>>> why attachments were designed as they are would be very welcome.
Many
> >>>>> thanks,
> >>>>>
> >>>>> Nick
> >>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message