httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Sprickman <sp...@bway.net>
Subject Re: [users@httpd] Port-based vhosts
Date Thu, 12 Mar 2009 05:13:01 GMT
On Wed, 11 Mar 2009, Andr? Warnier wrote:

> Charles Sprickman wrote:
> [...]
>  Under what
>> conditions does Apache then get involved and alter the URL?  Just 
>> redirects?  I understand a common redirect is just adding a trailing slash 
>> when the user does not supply it.  What are some other common cases? Who's 
>> call is it when a simple static site uses non-absolute URLs for all the 
>> links?  Is the browser building the fully-qualified links or apache (I 
>> suspect the former)?
>> 
> If you suspect that it is the browser, you suspect correctly.  But the 
> explanation is somewhat messy (and lengthy) unless you really understand the 
> basics.  Let me try a not entirely correct but hopefully didactic 
> explanation.
>
> Say the browser retrieves a first html page from a server, using the URL 
> "http://server.company.com/mydir/mypage.html".  This URL, from which the 
> browser retrieved the current page, is now for the browser the "base URL" of 
> the currently displayed document.
> Now say that this page contains a relative link like <img 
> src="images/myface.gif" />.
> If the user clicks on this link, the browser will construct a new URL by
> - removing the last component of the base URL (in this case "mypage.html"), 
> leaving "http://server.company.com/mydir/"
> - re-adding to that the relative link "images/myface.gif", giving 
> "http://server.company.com/mydir/images/myface.gif"
> - retrieving this new URL
> Nothing of that happens at the server side.  It's all done at the browser 
> level, any browser.
>
> In reality, what happens is a bit different, because in a URL like 
> "http://server.company.com/mydir/mypage.html", there are several parts which 
> are processed differently and independently, and a HTTP request is not really 
> to "http://server.company.com/mydir/mypage.html".  The real HTTP request 
> sequence is more like this :
>
> a) the browser opens a TCP connection to port 80 of the host which has the IP 
> address corresponding to the DNS resolution of the hostname 
> "server.company.com"
>
> b) on that connection, the browser writes a HTTP request like
> GET /mydir/mypage.html HTTP/1.1
> Host: server.company.com
>
> then it switches to read mode and waits for the server's response to arrive 
> on that same connection.
>
> So in my first explanation above, you have to leave out the "protocol" and 
> "host:port" from the current page's base URL, but the general idea remains.
>
> Now about the redirects, re-using the above logic.
> (This is what is called "external redirects", see later).
>
> b) the browser sends a request to the server, like
> GET /mydir HTTP/1.1
> Host: server.company.com
>
> c) the server sends a response to the browser, like
> 301 (this thing has moved, definitely)
> Location: /mydir/  (here is the new location)
>
> d) now the browser, automatically, re-sends a new request on the same 
> connection :
> GET /mydir/ HTTP/1.1
> Host: server.company.com
>
> e) and, presumably, the server now responds with the requested content.
>
> In addition, if the browser is smart, it will remember that the URL "/mydir" 
> has moved to "/mydir/", and the next time it will request it directly, even 
> if the forgetful user would request "/mydir" again.  It will also show the 
> "/mydir/" in the URL bar for that page, because that is the real URL it got 
> the page from (and in the vain hope of educating the user about the fact that 
> the URL "/mydir" is the wrong one and should not be used anymore).
>
> So, the penalty of using a 301 re-direct is that there is one more round-trip 
> server-browser-server (see c and d above).  But it is a relatively small one, 
> because the content is very short, and because nowadays with keep-alive 
> connections the same TCP connection browser-server can be used for all of it.
> The benefit is that the browser has the correct idea of what the "base URL" 
> is at all times, and thus that it can correctly interpret relative URLs and 
> compose the correct follow-up requests.
>
> "Internal" redirects :
>
> These are things that the server does internally, without telling the browser 
> about it.  mod_rewrite allows you to internally modify a request URL before 
> the rest of the server will make an attempt at finding and serving the 
> requested resource.  In that case thus, the browser sends a request like
> GET /mydir HTTP/1.1
> Host: server.company.com
>
> and the server, internally, modifies this "/mydir" to "/anotherdir/", then 
> proceeds to immediately serve the content of "/anotherdir/", without sending 
> a redirect to the browser, and without telling the browser about anything. 
> The browser gets a response :
> 200 OK
> ...
> .. content of "/anotherdir/"
>
> This is obviously faster, because you avoid a round-trip to the browser and 
> back, through a potentially slow connection.
>
> But now the browser does not know about the substitution, and genuinely 
> believes that what it got was the content corresponding to the "/mydir" URL. 
> So now if in this content it finds relative links like "images/myface.gif", 
> it will interpret them relative to the base URL "/mydir", and that may cause 
> further problems.
> So by doing this, you may be saving one round-trip for the original "/mydir", 
> but at best forcing subsequent round-trips for other links, at worst 
> potentially confusing the browser into requesting further invalid URLs.
>
> Whether one or the other scenario is better in your case, depends on many 
> factors, and you have to evaluate those yourself in function of your website 
> and what is really going on there.

Wow.  Thank you so much for the thorough explanation.  I really appreciate 
the time you and everyone else put into laying out how all this stuff 
interacts.

Is there any chance you could put a version of the above somewhere in the 
apache wiki?  Lots of stuff there is applicable to rewrites and 
ServerName.  It really fills in a ton of blanks in the core documentation 
since it deals with the very basics of how the browser and server work out 
redirects.

Thanks again,

Charles

>
>
>
>
>
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>  "   from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>
>

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message