abdera-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elias Torres <el...@torrez.us>
Subject Re: Abdera and IRIs
Date Thu, 21 Sep 2006 21:46:41 GMT
+1 for trunk


James M Snell wrote:
> Ok, so I've been looking into what is needed to allow Abdera to truly
> support IRIs as called for by the Atom spec.  A week ago, the only
> viable option was to introduce a dependency on ICU, which gives us the
> unicode and IDNA support but didn't actually provide an IRI
> implementation.  For that, we would have had to introduce yet another on
> something like the Jena projects IRI implementation (which uses ICU).
> Now, ICU is a very nice package and is pretty much THE standard for
> handling unicode in Java.  The problem is that it's a very large package
> and includes a whole lot more than we actually need.  (e.g. we don't
> need the calendar, collation, unicode compression, etc).
> So over the last week I've been working on some code to see how small of
> an implementation of the basic IRI/IDNA/Unicode stuff we could get and
> still claim compliance.  While more testing is needed, I've got a jar
> that weighs in at a relatively lightweight 326.5kb and provides support
> for IRI, IDNA, Punycode, Unicode Normalization, supplementary
> characters, etc.
> Working with an IRI is almost identical to working with a java.net.URI.
>   IRI iri = new IRI("http://www.詹姆斯.com/feed");
>   System.out.println(iri.toString());
>   System.out.println(iri.toASCIIString());
>   > http://www.詹姆斯.com/feed
>   > http://www.xn--8ws00zhy3a.com/feed
>   System.out.println(iri.getHost());
>   System.out.println(iri.getASCIIHost());
>   > www.詹姆斯.com
>   > www.xn--8ws00zhy3a.com
>   IRI iri1 = new IRI("http://www.詹姆斯.com/feed");
>   IRI iri2 = new IRI("http://www.xn--8ws00zhy3a.com/feed");
>   System.out.println(iri1.equals(iri2));
>   System.out.println(iri1.equivalent(iri2));
>   > false
>   > true
> The implementation also provides things that java's URI implementation
> doesn't.  Such as scheme specific equivalent checking.
> There are even test cases already that, while not 100% comprehensive,
> provide fairly decent coverage based on examples given in the various
> RFC's implemented.
> That said...
> Right now, the IRI implementation depends on my Unicode implementation,
> which hasn't, of course, had anywhere near the level of testing ICU has
> had.  It would be possible, however, for me to change the IRI
> implementation so that it can use either ICU or my Unicode stuff
> depending on whether ICU is in the classpath.  If ICU is present, I can
> use that unicode and IDNA implementation instead of mine.  It makes
> things a bit more complicated, but it's definitely something I can do.
> What I'm proposing is that I check in my IRI/IDNA/Unicode implementation
> and that we use it as the default impl.  The code would become part of
> the parser module.  After checking the code in and updating Abdera to
> use it, I'll work on enabling the automatic ICU switch.
> or...
> I create a branch of the trunk and integrate my implementation into the
> branch.  We kick the tires around on it, see if it works, work on
> enabling the ICU switch and when we get both working and we're all
> comfortable with it, we merge back into the trunk.
> - James

View raw message