commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Varszegi <>
Subject Re: [codec] Handling text encodings
Date Tue, 19 Nov 2002 09:33:57 GMT
It should work with streams, no doubt about it.  I think that there should be two separate
interfaces-- at least that's what I've usually done in such situations.  You can make a separate
Encoder and Decoder interface, and a Codec interface that extends them both.  That gives lots
flexibility if you want to include everything in one class.

Check out com.sun.image.codec.jpeg; here they have separate encoder and decoder classes. 
I read
that stuff a while back and it flavored my thinking.  Now check out out the classes in
com.sun.imageio .  Everything is readers and writers.  You may want to think about setting
up this way, too (or at least providing interfaces in advance to point the way, so that everything
will grow nicely together).

Now, here's one more thing to think about: intermediate encodings.  I had to write some stuff
using IBM machine-translation engines a while back.  I remember thinking how dumb it was that
needed to install a separate engine for every language pair.  Lots of pairs, as you can guess,
hadn't been implemented yet, but there were presumably thousands of IBM coders hard at work
implementing the n! engines necessary to supply comprehensive coverage for the world's languages.

They all had different dictionaries, even.  After that (actually, even before that time),
a lot of
focus in the translation-research community was put in the translation research community
translating to an intermediate form.  Like microcrotch's CLR.  Maybe we can wrassle out (without
spending too too much time) a decent way of arranging that.


--- Ola Berg <> wrote:
> > > The codec package is very simple.  Right now it contains 3 encoders
> > > specifically geared towards language ( Soundex, RefinedSoundex, and
> > > Metaphone ).  It also contains a Base64 encoder and decoder.
> > >
> > > There is only one interface "Encoder" with one method  "public
> > > String encode(String pString)".  I think we need another interface
> > > "Decoder", with a similarly simple interface "public String decode(String
> > > pString)".
> Hmm, I see a couple of issues with this.
> 1) It encodes chunks, and not streams. This is a scalability issue.
> 2) It is geared towards text. For Bootstring, I need arbitrary symbols.
> 3) There is no need for another interface with identical signatures. Maybe a Codec class
> points out two "coders" (one encoder and one decoder).
> For the short term, I think that a Punycode codec will do, and I will of course use Encoder
> you have put it.
> /O
> --
> To unsubscribe, e-mail:   <>
> For additional commands, e-mail: <>

Do you Yahoo!?
Yahoo! Web Hosting - Let the expert host your site

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message