xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: How to use XML to link to XML when the XML becomes HTML?
Date Fri, 31 Mar 2000 16:18:50 GMT
Dan Morrison wrote:

> IMO, the suffix of a file in a URI is a first-class citizen, deserving
> attention equal to the protocol. 

Why?

> However all URI breakdowns I've seen
> reagard suffix to be just part of the filename.
> I know this came from Unix vs Windows history, but it's a part of life
> nowadays.

Apart from broken behavior that looks for URI file extention before
MIME-types (as IE does), the issue is _much_ bigger than that.

In the awesome article "good URIs don't change" by web creator and W3C
director Tim Berners Lee, it is clearly shown why URI represent "uniform
resource identifiers"... this means:

 uniform: they should identify _any_ available network resource
 resource: something that you are able to access from your network
 identifier: a unique address for that resource

So, think about what you do when you go to a site: you type

 http://www.apache.org

and the front page gets to you. Actually the above is, to be picky,
wrong! you should type

 http://www.apache.org/

which means: use the http protocol (with the default port for that),
resolve the host "www.apache.org" and access the resource found on path
"/". If you type

 http://www.apache.org/index.html

you do two mistakes, even if the outcome is the same:

1) you assume that the main page is called "index.html"
2) you presume that the http server is serving HTML content.

True, the web server might well "virtualize" this same URI and take
whatever module to generate the output, and this could not even be HTML,
but, say, a welcome wav file!

But the mistake gets even worse once you understand that the network is
hyperlinked and there is no automatic way to tell if a link is broken
(rather than going thru it).

So, if your URI

 http://www.apache.org/index.html

is accessible, spiders will find it and might use _that_ URI to
reference to your site. Now, what happens if you change technology and
go to, say, 

 http://www.apache.org/index.xml

all of the sudden, all the links are broken. While, 

 http://www.apache.org/

would work as before.

This is a simple and stupid example, but think about URIs like these

 http://www.amazon.com/books/489347898387794?a=4898&b=4880

compared to


http://www.amazon.com/books/it/Dante_Alighieri/La_Divina_Commedia/paperback/comments

which one is better?

Tell me, which one is more likely to remain the same after a hundred
years?

Once you start thinking this way, everything will appear different to
you :)

It happened to me.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------



Mime
View raw message