On Wed, Jul 3, 2013 at 4:31 PM, Jim Schueler <jschueler@eloquency.com> wrote:
In light of Joe Schaefer's response, I appear to be outgunned.  So, if nothing else, can someone please clarify whether "de-chunked" means re-assembled?

yes, where re-assembled means convert it back to the original data stream without any sort of transport encoding
 


 -Jim


On Wed, 3 Jul 2013, Jim Schueler wrote:

Thanks for the prompt response, but this is your question, not mine.  I hardly need an RTFM for my trouble.

I drew my conclusions using a packet sniffer.  And as far-fetched as my answer may seem, it's more plausible than your theory that Apache or modperl is decoding a raw socket stream.

The crux of your question seems to be how the request content gets
magically re-assembled.  I don't think it was ever disassembled in the first place.  But if you don't like my answer, and you don't want to ignore it either, then please restate the question.  I can't find any definition for unchunked, and Wiktionary's definition of de-chunk says to "break apart a chunk", that is (counter-intuitively) chunk a chunk.


            Second, if there's no Content-Length header then how
            does one know how much
            data to read using $r->read?   

            One answer is until $r->read returns zero bytes, of
            course.  But, is
            that guaranteed to always be the case, even for,
            say, pipelined requests?  
            My guess is yes because whatever is de-chunking the

read() is blocking.  So it never returns 0, even in a pipeline request (if no data is available, it simply waits).  I don't wish to discuss the merits here, but there is no technical imperative for a content-length request in the request header.

-Jim






On Wed, 3 Jul 2013, Bill Moseley wrote:

Hi Jim,
This is the Transfer-Encoding: chunked I was writing about:

http://tools.ietf.org/html/rfc2616#section-3.6.1



On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler@eloquency.com>
wrote:
      I played around with chunking recently in the context of media
      streaming: The client is only requesting a "chunk" of data.
       "Chunking" is how media players perform a "seek".  It was
      originally implemented for FTP transfers:  E.g, to transfer a
      large file in (say 10K) chunks.  In the case that you describe
      below, if no Content-Length is specified, that indicates "send
      the remainder".

      >From what I know, a "chunk" request header is used this way to
      specify the server response.  It does not reflect anything about
      the data included in the body of the request.  So first, I would
      ask if you're confused about this request information.

      Hypothetically, some browsers might try to upload large files in
      small chunks and the "chunk" header might reflect a push
      transfer.  I don't know if "chunk" is ever used for this
      purpose.  But it would require the following characteristics:

        1.  The browser would need to originally inquire if the server
      is
            capable of this type of request.
        2.  Each chunk of data will arrive in a separate and
      independent HTTP
            request.  Not necessarily in the order they were sent.
        3.  Two or more requests may be handled by separate processes
            simultaneously that can't be written into a single
      destination.
        4.  Somehow the server needs to request a resend if a chunk is
      missing.
            Solving this problem requires an imaginitive use of HTTP.

      Sounds messy.  But might be appropriate for 100M+ sized uploads.
       This *may* reflect your situation.  Can you please confirm?

      For a single process, the incoming content-length is
      unnecessary. Buffered I/O automatically knows when transmission
      is complete.  The read() argument is the buffer size, not the
      content length.  Whether you spool the buffer to disk or simply
      enlarge the buffer should be determined by your hardware
      capabilities.  This is standard IO behavior that has nothing to
      do with HTTP chunk.  Without a "Content-Length" header, after
      looping your read() operation, determine the length of the
      aggregate data and pass that to Catalyst.

      But if you're confident that the complete request spans several
      smaller (chunked) HTTP requests, you'll need to address all the
      problems I've described above, plus the problem of re-assembling
      the whole thing for Catalyst.  I don't know anything about
      Plack, maybe it can perform all this required magic.

      Otherwise, if the whole purpose of the Plack temporary file is
      to pass a file handle, you can pass a buffer as a file handle.
       Used to be IO::String, but now that functionality is built into
      the core.

      By your last paragraph, I'm really lost.  Since you're already
      passing the request as a file handle, I'm guessing that Catalyst
      creates the tempororary file for the *response* body.  Can you
      please clarify?  Also, what do you mean by "de-chunking"?  Is
        >       that the same think as re-assembling?

      Wish I could give a better answer.  Let me know if this helps.

      -Jim


      On Tue, 2 Jul 2013, Bill Moseley wrote:

            For requests that are chunked (Transfer-Encoding:
            chunked and no
            Content-Length header) calling $r->read returns
            unchunked data from the
            socket.
            That's indeed handy.  Is that mod_perl doing that
            un-chunking or is it
            Apache?

            But, it leads to some questions.   

            First, if $r->read reads unchunked data then why is
            there a
            Transfer-Encoding header saying that the content is
            chunked?   Shouldn't
            that header be removed?   How does one know if the
            content is chunked or
            not, otherwise?

            Second, if there's no Content-Length header then how
            does one know how much
            data to read using $r->read?   

            One answer is until $r->read returns zero bytes, of
            course.  But, is
            that guaranteed to always be the case, even for,
            say, pipelined requests?  
            My guess is yes because whatever is de-chunking the
            request knows to stop
            after reading the last chunk, trailer and empty
            line.   Can anyone elaborate
            on how Apache/mod_perl is doing this? 


            Perhaps I'm approaching this incorrectly, but this
            is all a bit untidy.

            I'm using Catalyst and Catalyst needs a
            Content-Length.  So, I have a Plack
            Middleware component that creates a temporary file
            writing the buffer from
            $r->read( my $buffer, 64 * 1024 ) until that returns
            zero bytes.  I pass
            this file handle onto Catalyst.

            Then, for some content-types, Catalyst (via
            HTTP::Body) writes the body to
            another temp file.    I don't know how
            Apache/mod_perl does its de-chunking,
            but I can call $r->read with a huge buffer length
            and Apache returns that.
             So, maybe Apache is buffering to disk, too.

            In other words, for each tiny chunked JSON POST or
            PUT I'm creating two (or
            three?) temp files which doesn't seem ideal.


            --
            Bill Moseley
            moseley@hank.org




--
Bill Moseley
moseley@hank.org




--
Born in Roswell... married an alien...
http://emptyhammock.com/