couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick North <nort...@gmail.com>
Subject Re: Chunked input for multipart/related requests
Date Thu, 05 Dec 2013 19:47:53 GMT
To answer my own question on this: Option 3 turns out to be simple to
implement and apparently efficient, so I'm testing some code now and will
put together a pull request when I'm happy with it. The code replaces both
chunked and unchunked transfers with the Mochiweb stream_body function so
I'm trying out a patched installation on the current hot topic: replication
of the NPM registry.


Nick


On 4 December 2013 10:06, Nick North <north.n@gmail.com> wrote:

> As mentioned the other day, I'm hoping to add CouchDb support for chunked
> HTTP requests that contain a document and attachments as a single
> multipart/related MIME request, and I'm hoping the group can advise me on
> the best coding direction. Apologies in advance for the length and detail
> of the email, but there doesn't seem to be a shorter way to ask the
> question with a sensible amount of background.
>
> Parsing multipart requests happens
> in couch_httpd:parse_multipart_request/3. This function scans the request
> for the MIME boundary string, reading 4KB blocks of data as needed. The
> pieces of data between boundary strings are passed to callback functions
> for further processing. The function to read the next block of data is an
> argument to parse_multipart_request called DataFun; it returns the data
> block plus the function to be used as the next DataFun. I think of this as
> a pull-based approach: data is pulled from the request as needed, with the
> pull returning some data and a new pull function.
>
> The natural extension to handle chunked requests would be to provide an
> improved DataFun that can grab the next 4KB block from either a chunked or
> an unchunked request. So I looked for existing support for chunked requests
> that could be reused. The chunked equivalent of the couch_httpd:recv/2
> function that's used to pull 4KB blocks is the couch_httpd:recv_chunked/4
> function. This calls the Mochiweb stream_body/3 function which, it
> transpires, was created for use in CouchDb. However, this differs in
> philosophy from the recv function: while recv just hands back a block of
> data, stream_body reads the whole of the request and calls a ChunkFun
> parameter on each block of data that it reads. I think of this as a
> push-based approach: the entire stream is read and pushed into a callback
> function, one block at a time.
>
> I can think of three ways to fix the mismatch between the pull and
> push-based approaches and provide chunked multipart support:
>
>    1. Rework parse_multipart_request to be push-based. This would allow
>    reuse of stream_body, but at the cost of turning existing code inside out
>    to fit with its push approach.
>    2. Create a pull-based version of stream_body and probably try to get
>    in incorporated into Mochiweb. But having two similar versions of the same
>    code like this doesn't feel right.
>    3. Convert stream_body from push-based to pull-based by spawning it in
>    a new process that sends each block of data back to the
>    parse_multipart_request DataFun and then blocks until the message is
>    acknowledged. The DataFun receives the data when it needs to fetch the next
>    block, and then sends an acknowledgement.
>
> The third option feels neatest and is my preferred route. But my ignorance
> of Erlang means that I don't know whether this is potentially expensive.
> While a new process is very cheap, it would mean that all the request data
> is copied from that process to parse_multipart_request, and I don't know if
> that is very costly. That sort of copying already goes on
> in couch_doc:doc_from_multi_part_stream where the parser is spawned off and
> copies each document and attachment back to the parent process but I don't
> know if that means the copying is cheap, or if it's an unavoidable evil
> that shouldn't be reproduced elsewhere.
>
> I'd really appreciate any advice that the group can give me on the best
> option to follow, and why, or suggestions for options that I've missed
> altogether. Thanks in advance for your help,
>
> Nick
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message