cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J.Pietschmann" <>
Subject Re: URL Theory & Best Practices
Date Sat, 30 Nov 2002 20:36:10 GMT
Kjetil Kjernsmo wrote:
> So, I've got this bad feeling that IE is going 
> to ignore the content-type header ...
 > But I can't for the life of me understand how it can be
> standards-compliant...  

Well, IEx does not in general ignore the content-type
header, and it is, more or less, standards compliant,
just in a somewhat special way.
 From various rumours and gossip I compiled the following
story: IEx uses a variety of COM components for handling
content. A correct implementation would be to open the
network connection, read the headers including the content
type header, decide which component handles the content,
and then hand over the relevant headers and the open
connection to the component. It seems that handing open
connections to arbitrary COM components is difficult, or
was difficult at the time the architecture of IEx was decided,
therefore the browser component takes a look at the URL,
extracts what it thinks could be a "file extension", then
looks up whatever component is registered for this string
in the Windows registry (note that MIME types are not keys
there) and then hands the URL to the component. Obviously
it's up to the component what happens if the content type
does not match one of the possible types the component can
handle, or whether it even honors the content-type header.
In many cases a mismatch causes the connection to be closed
and another component determined by the content-type gets
the URL. BTW this is the mechanism the Klez virus uses
to get into windows systems. Some components seem to take a
second look at the URL, and sometimes they return errors or
something which causes the browser component to fall back
to the default HTML renderer which then most often draws a
blank. Caching plays a role too. Also, the algorithms for
extracting a "file extension" and perhaps content negotiation
seem to be implemented multiple times and probably in
different ways in various components, or perhaps the
components don't have access to necessary data (like
cookies) all the time.
The user usually doesn't notice anything. Problems arise
if the URL points to dynamic content where a second GET
can cause different stuff to be retrieved, in particular if
the content was'n completely read or wasn't cached for other
reasons (like SSL).
Disclaimer: most of the above is second hand knowledge.


Please check that your question  has not already been answered in the
FAQ before posting.     <>

To unsubscribe, e-mail:     <>
For additional commands, e-mail:   <>

View raw message