perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Issac Goldstand <mar...@beamartyr.net>
Subject Re: mod_perl and Transfer-Encoding: chunked
Date Thu, 04 Jul 2013 08:37:04 GMT
On 03/07/2013 21:53, Joseph Schaefer wrote:
> When you read from the input filter chain as $r->read does, the http
> input filter automatically handles the protocol and passes the dechunked
> data up to the caller. It does not spool the stream at all.
> 
> You'd have to look at how mod perl implements read to see if it loops
> its ap_get_brigade calls on the input filter chain to fill the passed
> buffer to the desired length or not.  But under no circumstances should
> you have to deal with chunked data directly.

I'm pretty sure that it's not even a mod_perl thing.  IIRC, httpd itself
sticks a chunk/de-chunk filter near the respective ends of the filter
chain.  So if you can't find the code in mod_perl land, you might want
to check httpd source.

> 
> HTH
> 
> Sent from my iPhone
> 
> On Jul 3, 2013, at 2:44 PM, Bill Moseley <moseley@hank.org
> <mailto:moseley@hank.org>> wrote:
> 
>> Hi Jim,
>>
>> This is the Transfer-Encoding: chunked I was writing about:
>>
>> http://tools.ietf.org/html/rfc2616#section-3.6.1
>>
>>
>>
>> On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler@eloquency.com
>> <mailto:jschueler@eloquency.com>> wrote:
>>
>>     I played around with chunking recently in the context of media
>>     streaming: The client is only requesting a "chunk" of data.
>>      "Chunking" is how media players perform a "seek".  It was
>>     originally implemented for FTP transfers:  E.g, to transfer a
>>     large file in (say 10K) chunks.  In the case that you describe
>>     below, if no Content-Length is specified, that indicates "send the
>>     remainder".
>>
>>     From what I know, a "chunk" request header is used this way to
>>     specify the server response.  It does not reflect anything about
>>     the data included in the body of the request.  So first, I would
>>     ask if you're confused about this request information.
>>
>>     Hypothetically, some browsers might try to upload large files in
>>     small chunks and the "chunk" header might reflect a push transfer.
>>      I don't know if "chunk" is ever used for this purpose.  But it
>>     would require the following characteristics:
>>
>>       1.  The browser would need to originally inquire if the server is
>>           capable of this type of request.
>>       2.  Each chunk of data will arrive in a separate and independent
>>     HTTP
>>           request.  Not necessarily in the order they were sent.
>>       3.  Two or more requests may be handled by separate processes
>>           simultaneously that can't be written into a single destination.
>>       4.  Somehow the server needs to request a resend if a chunk is
>>     missing.
>>           Solving this problem requires an imaginitive use of HTTP.
>>
>>     Sounds messy.  But might be appropriate for 100M+ sized uploads.
>>      This *may* reflect your situation.  Can you please confirm?
>>
>>     For a single process, the incoming content-length is unnecessary.
>>     Buffered I/O automatically knows when transmission is complete.
>>      The read() argument is the buffer size, not the content length.
>>      Whether you spool the buffer to disk or simply enlarge the buffer
>>     should be determined by your hardware capabilities.  This is
>>     standard IO behavior that has nothing to do with HTTP chunk.
>>      Without a "Content-Length" header, after looping your read()
>>     operation, determine the length of the aggregate data and pass
>>     that to Catalyst.
>>
>>     But if you're confident that the complete request spans several
>>     smaller (chunked) HTTP requests, you'll need to address all the
>>     problems I've described above, plus the problem of re-assembling
>>     the whole thing for Catalyst.  I don't know anything about Plack,
>>     maybe it can perform all this required magic.
>>
>>     Otherwise, if the whole purpose of the Plack temporary file is to
>>     pass a file handle, you can pass a buffer as a file handle.  Used
>>     to be IO::String, but now that functionality is built into the core.
>>
>>     By your last paragraph, I'm really lost.  Since you're already
>>     passing the request as a file handle, I'm guessing that Catalyst
>>     creates the tempororary file for the *response* body.  Can you
>>     please clarify?  Also, what do you mean by "de-chunking"?  Is that
>>     the same think as re-assembling?
>>
>>     Wish I could give a better answer.  Let me know if this helps.
>>
>>     -Jim
>>
>>
>>
>>     On Tue, 2 Jul 2013, Bill Moseley wrote:
>>
>>         For requests that are chunked (Transfer-Encoding: chunked and no
>>         Content-Length header) calling $r->read returns unchunked data
>>         from the
>>         socket.
>>         That's indeed handy.  Is that mod_perl doing that un-chunking
>>         or is it
>>         Apache?
>>
>>         But, it leads to some questions.   
>>
>>         First, if $r->read reads unchunked data then why is there a
>>         Transfer-Encoding header saying that the content is chunked?  
>>         Shouldn't
>>         that header be removed?   How does one know if the content is
>>         chunked or
>>         not, otherwise?
>>
>>         Second, if there's no Content-Length header then how does one
>>         know how much
>>         data to read using $r->read?   
>>
>>         One answer is until $r->read returns zero bytes, of course.
>>          But, is
>>         that guaranteed to always be the case, even for, say,
>>         pipelined requests?  
>>         My guess is yes because whatever is de-chunking the request
>>         knows to stop
>>         after reading the last chunk, trailer and empty line.   Can
>>         anyone elaborate
>>         on how Apache/mod_perl is doing this? 
>>
>>
>>         Perhaps I'm approaching this incorrectly, but this is all a
>>         bit untidy.
>>
>>         I'm using Catalyst and Catalyst needs a Content-Length.  So, I
>>         have a Plack
>>         Middleware component that creates a temporary file writing the
>>         buffer from
>>         $r->read( my $buffer, 64 * 1024 ) until that returns zero
>>         bytes.  I pass
>>         this file handle onto Catalyst.
>>
>>         Then, for some content-types, Catalyst (via HTTP::Body) writes
>>         the body to
>>         another temp file.    I don't know how Apache/mod_perl does
>>         its de-chunking,
>>         but I can call $r->read with a huge buffer length and Apache
>>         returns that.
>>          So, maybe Apache is buffering to disk, too.
>>
>>         In other words, for each tiny chunked JSON POST or PUT I'm
>>         creating two (or
>>         three?) temp files which doesn't seem ideal.
>>
>>
>>         --
>>         Bill Moseley
>>         moseley@hank.org <mailto:moseley@hank.org>
>>
>>
>>
>>
>> -- 
>> Bill Moseley
>> moseley@hank.org <mailto:moseley@hank.org>


Mime
View raw message