On Wed, Jul 3, 2013 at 4:31 PM, Jim Schueler wrote:
In light of Joe Schaefer's response, I a= ppear to be outgunned. =A0So, if nothing else, can someone please clarify w= hether "de-chunked" means re-assembled?

yes, where re-assembled means convert it back to the or= iginal data stream without any sort of transport encoding
=A0

=A0-Jim

On Wed, 3 Jul 2013, Jim Schueler wrote:

Thanks for the prompt response, but this is your question, not mine. =A0I h= ardly need an RTFM for my trouble.

I drew my conclusions using a packet sniffer. =A0And as far-fetched as my a= nswer may seem, it's more plausible than your theory that Apache or mod= perl is decoding a raw socket stream.

The crux of your question seems to be how the request content gets
magically re-assembled. =A0I don't think it was ever disassembled in th= e first place. =A0But if you don't like my answer, and you don't wa= nt to ignore it either, then please restate the question. =A0I can't fi= nd any definition for unchunked, and Wiktionary's definition of de-chun= k says to "break apart a chunk", that is (counter-intuitively) ch= unk a chunk.

=A0 =A0 =A0 =A0 =A0 =A0 Second, if there's no Content-Length header the= n how
=A0 =A0 =A0 =A0 =A0 =A0 does one know how much
=A0 =A0 =A0 =A0 =A0 =A0 data to read using \$r->read? =A0=A0

=A0 =A0 =A0 =A0 =A0 =A0 One answer is until \$r->read returns zero bytes,= of
=A0 =A0 =A0 =A0 =A0 =A0 course. =A0But, is
=A0 =A0 =A0 =A0 =A0 =A0 that=A0guaranteed=A0to always be the case, even for= ,
=A0 =A0 =A0 =A0 =A0 =A0 say, pipelined requests? =A0
=A0 =A0 =A0 =A0 =A0 =A0 My guess is yes because whatever is de-chunking the=

read() is blocking. =A0So it never returns 0, even in a pipeline request (i= f no data is available, it simply waits). =A0I don't wish to discuss th= e merits here, but there is no technical imperative for a content-length re= quest in the request header.

-Jim

On Wed, 3 Jul 2013, Bill Moseley wrote:

Hi Jim,
This is the Transfer-Encoding: chunked I was writing about:

http://tools.ietf.org/html/rfc2616#section-3.6.1

On Wed, Jul 3, 2013 at 11:34 AM, Jim Schueler <jschueler@eloquency.com>
wrote:
=A0 =A0 =A0 I played around with chunking recently in the context of media<= br> =A0 =A0 =A0 streaming: The client is only requesting a "chunk" of= data.
=A0 =A0 =A0 =A0"Chunking" is how media players perform a "se= ek". =A0It was
=A0 =A0 =A0 originally implemented for FTP transfers: =A0E.g, to transfer a=
=A0 =A0 =A0 large file in (say 10K) chunks. =A0In the case that you describ= e
=A0 =A0 =A0 below, if no Content-Length is specified, that indicates "= send
=A0 =A0 =A0 the remainder".

=A0 =A0 =A0 >From what I know, a "chunk" request header is use= d this way to
=A0 =A0 =A0 specify the server response. =A0It does not reflect anything ab= out
=A0 =A0 =A0 the data included in the body of the request. =A0So first, I wo= uld

=A0 =A0 =A0 Hypothetically, some browsers might try to upload large files i= n
=A0 =A0 =A0 small chunks and the "chunk" header might reflect a p= ush
=A0 =A0 =A0 transfer. =A0I don't know if "chunk" is ever used= for this
=A0 =A0 =A0 purpose. =A0But it would require the following characteristics:=

=A0 =A0 =A0 =A0 1. =A0The browser would need to originally inquire if the s= erver
=A0 =A0 =A0 is
=A0 =A0 =A0 =A0 =A0 =A0 capable of this type of request.
=A0 =A0 =A0 =A0 2. =A0Each chunk of data will arrive in a separate and
=A0 =A0 =A0 independent HTTP
=A0 =A0 =A0 =A0 =A0 =A0 request. =A0Not necessarily in the order they were = sent.
=A0 =A0 =A0 =A0 3. =A0Two or more requests may be handled by separate proce= sses
=A0 =A0 =A0 =A0 =A0 =A0 simultaneously that can't be written into a sin= gle
=A0 =A0 =A0 destination.
=A0 =A0 =A0 =A0 4. =A0Somehow the server needs to request a resend if a chu= nk is
=A0 =A0 =A0 missing.
=A0 =A0 =A0 =A0 =A0 =A0 Solving this problem requires an imaginitive use of= HTTP.

=A0 =A0 =A0 Sounds messy. =A0But might be appropriate for 100M+ sized uploa= ds.
=A0 =A0 =A0 =A0This *may* reflect your situation. =A0Can you please confirm= ?

=A0 =A0 =A0 For a single process, the incoming content-length is
=A0 =A0 =A0 unnecessary. Buffered I/O automatically knows when transmission=
=A0 =A0 =A0 is complete. =A0The read() argument is the buffer size, not the=
=A0 =A0 =A0 content length. =A0Whether you spool the buffer to disk or simp= ly
=A0 =A0 =A0 enlarge the buffer should be determined by your hardware
=A0 =A0 =A0 capabilities. =A0This is standard IO behavior that has nothing = to
=A0 =A0 =A0 do with HTTP chunk. =A0Without a "Content-Length" hea= der, after
=A0 =A0 =A0 looping your read() operation, determine the length of the
=A0 =A0 =A0 aggregate data and pass that to Catalyst.

=A0 =A0 =A0 But if you're confident that the complete request spans sev= eral
=A0 =A0 =A0 smaller (chunked) HTTP requests, you'll need to address all= the
=A0 =A0 =A0 problems I've described above, plus the problem of re-assem= bling
=A0 =A0 =A0 the whole thing for Catalyst. =A0I don't know anything abou= t
=A0 =A0 =A0 Plack, maybe it can perform all this required magic.

=A0 =A0 =A0 Otherwise, if the whole purpose of the Plack temporary file is<= br> =A0 =A0 =A0 to pass a file handle, you can pass a buffer as a file handle.<= br> =A0 =A0 =A0 =A0Used to be IO::String, but now that functionality is built i= nto
=A0 =A0 =A0 the core.

=A0 =A0 =A0 By your last paragraph, I'm really lost. =A0Since you'r= e already
=A0 =A0 =A0 passing the request as a file handle, I'm guessing that Cat= alyst
=A0 =A0 =A0 creates the tempororary file for the *response* body. =A0Can yo= u
=A0 =A0 =A0 please clarify? =A0Also, what do you mean by "de-chunking&= quot;? =A0Is
=A0 =A0 =A0 =A0 > =A0 =A0 =A0 that the same think as re-assembling?

=A0 =A0 =A0 Wish I could give a better answer. =A0Let me know if this helps= .

=A0 =A0 =A0 -Jim

=A0 =A0 =A0 On Tue, 2 Jul 2013, Bill Moseley wrote:

=A0 =A0 =A0 =A0 =A0 =A0 For requests that are chunked (Transfer-Encoding: =A0 =A0 =A0 =A0 =A0 =A0 chunked and no
=A0 =A0 =A0 =A0 =A0 =A0 Content-Length header) calling \$r->read returns<= br> =A0 =A0 =A0 =A0 =A0 =A0 unchunked=A0data from the
=A0 =A0 =A0 =A0 =A0 =A0 socket.
=A0 =A0 =A0 =A0 =A0 =A0 That's indeed handy. =A0Is that mod_perl doing = that
=A0 =A0 =A0 =A0 =A0 =A0 un-chunking or is it
=A0 =A0 =A0 =A0 =A0 =A0 Apache?

=A0 =A0 =A0 =A0 =A0 =A0 But, it leads to some questions. =A0=A0

=A0 =A0 =A0 =A0 =A0 =A0 First, if \$r->read reads unchunked data then why= is
=A0 =A0 =A0 =A0 =A0 =A0 there a
=A0 =A0 =A0 =A0 =A0 =A0 Transfer-Encoding header saying that the content is=
=A0 =A0 =A0 =A0 =A0 =A0 chunked? =A0 Shouldn't
=A0 =A0 =A0 =A0 =A0 =A0 that header be removed? =A0 How does one know if th= e
=A0 =A0 =A0 =A0 =A0 =A0 content is chunked or
=A0 =A0 =A0 =A0 =A0 =A0 not, otherwise?

=A0 =A0 =A0 =A0 =A0 =A0 Second, if there's no Content-Length header the= n how
=A0 =A0 =A0 =A0 =A0 =A0 does one know how much
=A0 =A0 =A0 =A0 =A0 =A0 data to read using \$r->read? =A0=A0

=A0 =A0 =A0 =A0 =A0 =A0 One answer is until \$r->read returns zero bytes,= of
=A0 =A0 =A0 =A0 =A0 =A0 course. =A0But, is
=A0 =A0 =A0 =A0 =A0 =A0 that=A0guaranteed=A0to always be the case, even for= ,
=A0 =A0 =A0 =A0 =A0 =A0 say, pipelined requests? =A0
=A0 =A0 =A0 =A0 =A0 =A0 My guess is yes because whatever is de-chunking the=
=A0 =A0 =A0 =A0 =A0 =A0 request knows to stop
=A0 =A0 =A0 =A0 =A0 =A0 after reading the last chunk, trailer and empty
=A0 =A0 =A0 =A0 =A0 =A0 line. =A0 Can anyone=A0elaborate
=A0 =A0 =A0 =A0 =A0 =A0 on how Apache/mod_perl is doing this?=A0

=A0 =A0 =A0 =A0 =A0 =A0 Perhaps I'm approaching this incorrectly, but t= his
=A0 =A0 =A0 =A0 =A0 =A0 is all a bit untidy.

=A0 =A0 =A0 =A0 =A0 =A0 I'm using Catalyst and Catalyst needs a
=A0 =A0 =A0 =A0 =A0 =A0 Content-Length. =A0So, I have a Plack
=A0 =A0 =A0 =A0 =A0 =A0 Middleware component that creates a temporary file<= br> =A0 =A0 =A0 =A0 =A0 =A0 writing the buffer from
=A0 =A0 =A0 =A0 =A0 =A0 \$r->read( my \$buffer, 64 * 1024 ) until that ret= urns
=A0 =A0 =A0 =A0 =A0 =A0 zero bytes. =A0I pass
=A0 =A0 =A0 =A0 =A0 =A0 this file handle onto Catalyst.

=A0 =A0 =A0 =A0 =A0 =A0 Then, for some content-types, Catalyst (via
=A0 =A0 =A0 =A0 =A0 =A0 HTTP::Body) writes the body to
=A0 =A0 =A0 =A0 =A0 =A0 another=A0temp file. =A0 =A0I don't know how =A0 =A0 =A0 =A0 =A0 =A0 Apache/mod_perl does its de-chunking,
=A0 =A0 =A0 =A0 =A0 =A0 but I can call \$r->read with a huge buffer lengt= h
=A0 =A0 =A0 =A0 =A0 =A0 and Apache returns that.
=A0 =A0 =A0 =A0 =A0 =A0 =A0So, maybe Apache is buffering to disk, too.

=A0 =A0 =A0 =A0 =A0 =A0 In other words, for each tiny chunked JSON POST or<= br> =A0 =A0 =A0 =A0 =A0 =A0 PUT I'm creating two (or
=A0 =A0 =A0 =A0 =A0 =A0 three?) temp files which doesn't seem ideal.

=A0 =A0 =A0 =A0 =A0 =A0 --
=A0 =A0 =A0 =A0 =A0 =A0 Bill Moseley
=A0 =A0 =A0 =A0 =A0 =A0 moseley@hank.org

--
Bill Moseley
moseley@hank.org<= br>

--
= Born in Roswell... married an alien...
http://emptyhammock.com/
--14dae94ed6413c70fe04e0a17bf6--