perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Schaefer <joe_schae...@yahoo.com>
Subject Re: mod_perl and Transfer-Encoding: chunked
Date Wed, 03 Jul 2013 20:42:04 GMT
Dechunked means it strips out the lines containing metadata about the next block of raw data.
The metadata is just the length of the next block of data.  Imagine a chunked stream is like
having partial content length headers embedded in the data stream.

The http filter embedded in httpd takes care of the metadata so you don't have to parse the
stream yourself. $r->read will always provide only the raw data in a blocking call, until
the stream is finished in which case it should return 0 or an error code.  Check the mod perl
docs, or better the source, to see if the semantics are more like perl's sysread or more like
read.

Sent from my iPhone

On Jul 3, 2013, at 4:31 PM, Jim Schueler <jschueler@eloquency.com> wrote:

> In light of Joe Schaefer's response, I appear to be outgunned.  So, if nothing else,
can someone please clarify whether "de-chunked" means re-assembled?
> 
> -Jim
> 
> On Wed, 3 Jul 2013, Jim Schueler wrote:
> 
>> Thanks for the prompt response, but this is your question, not mine.  I hardly need
an RTFM for my trouble.
>> 
>> I drew my conclusions using a packet sniffer.  And as far-fetched as my answer may
seem, it's more plausible than your theory that Apache or modperl is decoding a raw socket
stream.
>> 
>> The crux of your question seems to be how the request content gets
>> magically re-assembled.  I don't think it was ever disassembled in the first place.
 But if you don't like my answer, and you don't want to ignore it either, then please restate
the question.  I can't find any definition for unchunked, and Wiktionary's definition of de-chunk
says to "break apart a chunk", that is (counter-intuitively) chunk a chunk.
>> 
>> 
>>>           Second, if there's no Content-Length header then how
>>>           does one know how much
>>>           data to read using $r->read?   
>>> 
>>>           One answer is until $r->read returns zero bytes, of
>>>           course.  But, is
>>>           that guaranteed to always be the case, even for,
>>>           say, pipelined requests?  
>>>           My guess is yes because whatever is de-chunking the
>> 
>> read() is blocking.  So it never returns 0, even in a pipeline request (if no data
is available, it simply waits).  I don't wish to discuss the merits here, but there is no
technical imperative for a content-length request in the request header.
>> 
>> -Jim
>> 
>> 
>> 
>> 
>> 
>> 
>> On Wed, 3 Jul 2013, Bill Moseley wrote:
>> 
>>> Hi Jim,
>>> This is the Transfer-Encoding: chunked I was writing about:
>>> http://tools.ietf.org/html/rfc2616#section-3.6.1
>>> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler@eloquency.com>
>>> wrote:
>>>     I played around with chunking recently in the context of media
>>>     streaming: The client is only requesting a "chunk" of data.
>>>      "Chunking" is how media players perform a "seek".  It was
>>>     originally implemented for FTP transfers:  E.g, to transfer a
>>>     large file in (say 10K) chunks.  In the case that you describe
>>>     below, if no Content-Length is specified, that indicates "send
>>>     the remainder".
>>> 
>>>> From what I know, a "chunk" request header is used this way to
>>>     specify the server response.  It does not reflect anything about
>>>     the data included in the body of the request.  So first, I would
>>>     ask if you're confused about this request information.
>>> 
>>>     Hypothetically, some browsers might try to upload large files in
>>>     small chunks and the "chunk" header might reflect a push
>>>     transfer.  I don't know if "chunk" is ever used for this
>>>     purpose.  But it would require the following characteristics:
>>> 
>>>       1.  The browser would need to originally inquire if the server
>>>     is
>>>           capable of this type of request.
>>>       2.  Each chunk of data will arrive in a separate and
>>>     independent HTTP
>>>           request.  Not necessarily in the order they were sent.
>>>       3.  Two or more requests may be handled by separate processes
>>>           simultaneously that can't be written into a single
>>>     destination.
>>>       4.  Somehow the server needs to request a resend if a chunk is
>>>     missing.
>>>           Solving this problem requires an imaginitive use of HTTP.
>>> 
>>>     Sounds messy.  But might be appropriate for 100M+ sized uploads.
>>>      This *may* reflect your situation.  Can you please confirm?
>>> 
>>>     For a single process, the incoming content-length is
>>>     unnecessary. Buffered I/O automatically knows when transmission
>>>     is complete.  The read() argument is the buffer size, not the
>>>     content length.  Whether you spool the buffer to disk or simply
>>>     enlarge the buffer should be determined by your hardware
>>>     capabilities.  This is standard IO behavior that has nothing to
>>>     do with HTTP chunk.  Without a "Content-Length" header, after
>>>     looping your read() operation, determine the length of the
>>>     aggregate data and pass that to Catalyst.
>>> 
>>>     But if you're confident that the complete request spans several
>>>     smaller (chunked) HTTP requests, you'll need to address all the
>>>     problems I've described above, plus the problem of re-assembling
>>>     the whole thing for Catalyst.  I don't know anything about
>>>     Plack, maybe it can perform all this required magic.
>>> 
>>>     Otherwise, if the whole purpose of the Plack temporary file is
>>>     to pass a file handle, you can pass a buffer as a file handle.
>>>      Used to be IO::String, but now that functionality is built into
>>>     the core.
>>> 
>>>     By your last paragraph, I'm really lost.  Since you're already
>>>     passing the request as a file handle, I'm guessing that Catalyst
>>>     creates the tempororary file for the *response* body.  Can you
>>>     please clarify?  Also, what do you mean by "de-chunking"?  Is
>>>      that the same think as re-assembling?
>>> 
>>>     Wish I could give a better answer.  Let me know if this helps.
>>> 
>>>     -Jim
>>> 
>>>     On Tue, 2 Jul 2013, Bill Moseley wrote:
>>> 
>>>           For requests that are chunked (Transfer-Encoding:
>>>           chunked and no
>>>           Content-Length header) calling $r->read returns
>>>           unchunked data from the
>>>           socket.
>>>           That's indeed handy.  Is that mod_perl doing that
>>>           un-chunking or is it
>>>           Apache?
>>> 
>>>           But, it leads to some questions.   
>>> 
>>>           First, if $r->read reads unchunked data then why is
>>>           there a
>>>           Transfer-Encoding header saying that the content is
>>>           chunked?   Shouldn't
>>>           that header be removed?   How does one know if the
>>>           content is chunked or
>>>           not, otherwise?
>>> 
>>>           Second, if there's no Content-Length header then how
>>>           does one know how much
>>>           data to read using $r->read?   
>>> 
>>>           One answer is until $r->read returns zero bytes, of
>>>           course.  But, is
>>>           that guaranteed to always be the case, even for,
>>>           say, pipelined requests?  
>>>           My guess is yes because whatever is de-chunking the
>>>           request knows to stop
>>>           after reading the last chunk, trailer and empty
>>>           line.   Can anyone elaborate
>>>           on how Apache/mod_perl is doing this? 
>>> 
>>>           Perhaps I'm approaching this incorrectly, but this
>>>           is all a bit untidy.
>>> 
>>>           I'm using Catalyst and Catalyst needs a
>>>           Content-Length.  So, I have a Plack
>>>           Middleware component that creates a temporary file
>>>           writing the buffer from
>>>           $r->read( my $buffer, 64 * 1024 ) until that returns
>>>           zero bytes.  I pass
>>>           this file handle onto Catalyst.
>>> 
>>>           Then, for some content-types, Catalyst (via
>>>           HTTP::Body) writes the body to
>>>           another temp file.    I don't know how
>>>           Apache/mod_perl does its de-chunking,
>>>           but I can call $r->read with a huge buffer length
>>>           and Apache returns that.
>>>            So, maybe Apache is buffering to disk, too.
>>> 
>>>           In other words, for each tiny chunked JSON POST or
>>>           PUT I'm creating two (or
>>>           three?) temp files which doesn't seem ideal.
>>> 
>>>           --
>>>           Bill Moseley
>>>           moseley@hank.org
>>> --
>>> Bill Moseley
>>> moseley@hank.org

Mime
View raw message