abdera-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James M Snell <jasn...@gmail.com>
Subject IRI Support and ICU
Date Wed, 13 Sep 2006 23:53:29 GMT
The Atom specification defines that IRI's can be used anywhere within
Atom documents.  Unfortunately, however, Java 1.5 and earlier does not
include support for converting IRIs to URIs as necessary in order to get
a dereferenceable URI.  Currently we fake it by parsing out to URI, but
that definitely has a number of problems.

For instance, consider the following feed:

  http://www.詹姆斯.com/feed   (James Holderness' weblog)

If I do:

  URI uri = new URI("http://www.詹姆斯.com/feed");

The URI will be created without throwing any errors, despite the fact
that the unicode characters are not legal in a URI.  Calling
uri.toString() will return the URI.

However, calling uri.getHost() on this URI improperly returns null.
Calling uri.getAuthority() returns the host name, but if the URI also
has a port specified, getAuthority() also returns the port (e.g. for
"http://www.詹姆斯.com:80/feed" getAuthority() returns "www.詹姆斯.com:80"

Worse yet, if I call uri.toASCIIString() the output from URI is
http://www.%E8%A9%B9%E5%A7%86%E6%96%AF.com/feed, which is quite clearly

Now, all of our (IBMs) implementations have ICU [1] available, which
includes proper IDN support.  It's a simple matter to write an IRI to
URI converter..

Unfortunately, this is *really* slow and ICU is a big package (3.08M for
the jar) and we really don't have need for the whole thing.  It's fine
for platforms that already have ICU, but requiring an additional 3.08M
download so we can slowly convert and IRI to a URI really bugs.

That said, however, I'm not sure how we can get around it. Even the Jena
projects IRI implementation (generally considered by those more
knowledgeable about this than I to be pretty good) depends on ICU.

So, anyway, long story short: if we want proper support for IRIs (which
we need) then we're going to have to introduce a dependency on ICU.  I'm
not happy about it, but I don't see any other way around it.


- James

[1] http://www-306.ibm.com/software/globalization/icu/index.jsp

View raw message