perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Trawick <traw...@gmail.com>
Subject Re: mod_perl and Transfer-Encoding: chunked
Date Wed, 03 Jul 2013 20:41:08 GMT
On Wed, Jul 3, 2013 at 4:31 PM, Jim Schueler <jschueler@eloquency.com>wrote:

> In light of Joe Schaefer's response, I appear to be outgunned.  So, if
> nothing else, can someone please clarify whether "de-chunked" means
> re-assembled?


yes, where re-assembled means convert it back to the original data stream
without any sort of transport encoding


>
>
>  -Jim
>
>
> On Wed, 3 Jul 2013, Jim Schueler wrote:
>
>  Thanks for the prompt response, but this is your question, not mine.  I
>> hardly need an RTFM for my trouble.
>>
>> I drew my conclusions using a packet sniffer.  And as far-fetched as my
>> answer may seem, it's more plausible than your theory that Apache or
>> modperl is decoding a raw socket stream.
>>
>> The crux of your question seems to be how the request content gets
>> magically re-assembled.  I don't think it was ever disassembled in the
>> first place.  But if you don't like my answer, and you don't want to ignore
>> it either, then please restate the question.  I can't find any definition
>> for unchunked, and Wiktionary's definition of de-chunk says to "break apart
>> a chunk", that is (counter-intuitively) chunk a chunk.
>>
>>
>>              Second, if there's no Content-Length header then how
>>>             does one know how much
>>>             data to read using $r->read?
>>>
>>>             One answer is until $r->read returns zero bytes, of
>>>             course.  But, is
>>>             that guaranteed to always be the case, even for,
>>>             say, pipelined requests?
>>>             My guess is yes because whatever is de-chunking the
>>>
>>
>> read() is blocking.  So it never returns 0, even in a pipeline request
>> (if no data is available, it simply waits).  I don't wish to discuss the
>> merits here, but there is no technical imperative for a content-length
>> request in the request header.
>>
>> -Jim
>>
>>
>>
>>
>>
>>
>> On Wed, 3 Jul 2013, Bill Moseley wrote:
>>
>>  Hi Jim,
>>> This is the Transfer-Encoding: chunked I was writing about:
>>>
>>> http://tools.ietf.org/html/**rfc2616#section-3.6.1<http://tools.ietf.org/html/rfc2616#section-3.6.1>
>>>
>>>
>>>
>>> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler@eloquency.com>
>>> wrote:
>>>       I played around with chunking recently in the context of media
>>>       streaming: The client is only requesting a "chunk" of data.
>>>        "Chunking" is how media players perform a "seek".  It was
>>>       originally implemented for FTP transfers:  E.g, to transfer a
>>>       large file in (say 10K) chunks.  In the case that you describe
>>>       below, if no Content-Length is specified, that indicates "send
>>>       the remainder".
>>>
>>>       >From what I know, a "chunk" request header is used this way to
>>>       specify the server response.  It does not reflect anything about
>>>       the data included in the body of the request.  So first, I would
>>>       ask if you're confused about this request information.
>>>
>>>       Hypothetically, some browsers might try to upload large files in
>>>       small chunks and the "chunk" header might reflect a push
>>>       transfer.  I don't know if "chunk" is ever used for this
>>>       purpose.  But it would require the following characteristics:
>>>
>>>         1.  The browser would need to originally inquire if the server
>>>       is
>>>             capable of this type of request.
>>>         2.  Each chunk of data will arrive in a separate and
>>>       independent HTTP
>>>             request.  Not necessarily in the order they were sent.
>>>         3.  Two or more requests may be handled by separate processes
>>>             simultaneously that can't be written into a single
>>>       destination.
>>>         4.  Somehow the server needs to request a resend if a chunk is
>>>       missing.
>>>             Solving this problem requires an imaginitive use of HTTP.
>>>
>>>       Sounds messy.  But might be appropriate for 100M+ sized uploads.
>>>        This *may* reflect your situation.  Can you please confirm?
>>>
>>>       For a single process, the incoming content-length is
>>>       unnecessary. Buffered I/O automatically knows when transmission
>>>       is complete.  The read() argument is the buffer size, not the
>>>       content length.  Whether you spool the buffer to disk or simply
>>>       enlarge the buffer should be determined by your hardware
>>>       capabilities.  This is standard IO behavior that has nothing to
>>>       do with HTTP chunk.  Without a "Content-Length" header, after
>>>       looping your read() operation, determine the length of the
>>>       aggregate data and pass that to Catalyst.
>>>
>>>       But if you're confident that the complete request spans several
>>>       smaller (chunked) HTTP requests, you'll need to address all the
>>>       problems I've described above, plus the problem of re-assembling
>>>       the whole thing for Catalyst.  I don't know anything about
>>>       Plack, maybe it can perform all this required magic.
>>>
>>>       Otherwise, if the whole purpose of the Plack temporary file is
>>>       to pass a file handle, you can pass a buffer as a file handle.
>>>        Used to be IO::String, but now that functionality is built into
>>>       the core.
>>>
>>>       By your last paragraph, I'm really lost.  Since you're already
>>>       passing the request as a file handle, I'm guessing that Catalyst
>>>       creates the tempororary file for the *response* body.  Can you
>>>       please clarify?  Also, what do you mean by "de-chunking"?  Is
>>>
>>         >       that the same think as re-assembling?
>>
>>>
>>>       Wish I could give a better answer.  Let me know if this helps.
>>>
>>>       -Jim
>>>
>>>
>>>       On Tue, 2 Jul 2013, Bill Moseley wrote:
>>>
>>>             For requests that are chunked (Transfer-Encoding:
>>>             chunked and no
>>>             Content-Length header) calling $r->read returns
>>>             unchunked data from the
>>>             socket.
>>>             That's indeed handy.  Is that mod_perl doing that
>>>             un-chunking or is it
>>>             Apache?
>>>
>>>             But, it leads to some questions.
>>>
>>>             First, if $r->read reads unchunked data then why is
>>>             there a
>>>             Transfer-Encoding header saying that the content is
>>>             chunked?   Shouldn't
>>>             that header be removed?   How does one know if the
>>>             content is chunked or
>>>             not, otherwise?
>>>
>>>             Second, if there's no Content-Length header then how
>>>             does one know how much
>>>             data to read using $r->read?
>>>
>>>             One answer is until $r->read returns zero bytes, of
>>>             course.  But, is
>>>             that guaranteed to always be the case, even for,
>>>             say, pipelined requests?
>>>             My guess is yes because whatever is de-chunking the
>>>             request knows to stop
>>>             after reading the last chunk, trailer and empty
>>>             line.   Can anyone elaborate
>>>             on how Apache/mod_perl is doing this?
>>>
>>>
>>>             Perhaps I'm approaching this incorrectly, but this
>>>             is all a bit untidy.
>>>
>>>             I'm using Catalyst and Catalyst needs a
>>>             Content-Length.  So, I have a Plack
>>>             Middleware component that creates a temporary file
>>>             writing the buffer from
>>>             $r->read( my $buffer, 64 * 1024 ) until that returns
>>>             zero bytes.  I pass
>>>             this file handle onto Catalyst.
>>>
>>>             Then, for some content-types, Catalyst (via
>>>             HTTP::Body) writes the body to
>>>             another temp file.    I don't know how
>>>             Apache/mod_perl does its de-chunking,
>>>             but I can call $r->read with a huge buffer length
>>>             and Apache returns that.
>>>              So, maybe Apache is buffering to disk, too.
>>>
>>>             In other words, for each tiny chunked JSON POST or
>>>             PUT I'm creating two (or
>>>             three?) temp files which doesn't seem ideal.
>>>
>>>
>>>             --
>>>             Bill Moseley
>>>             moseley@hank.org
>>>
>>>
>>>
>>>
>>> --
>>> Bill Moseley
>>> moseley@hank.org
>>>
>>>


-- 
Born in Roswell... married an alien...
http://emptyhammock.com/

Mime
View raw message