commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tobr...@transolutions.net (O'brien, Tim)
Subject RE: [codec] Handling text encodings (one more thing, sorry)
Date Tue, 19 Nov 2002 14:13:17 GMT
A lot of good points.  Soundex, Metaphone, and Refined Soundex all deal
with language, it would make more sense if these classes were moved into
a language subpackage.

With regards to streams, I think it makes sense for something like
Base64 - most definitely this should be a stream oriented codec.  My
only question relates to something like Metaphone or Soundex.  The
soundex algorithm is a truncated encoding that was primarily developed
to encode last names - for example "O'Brien", or "Varszegi".   It seems
like wrapping "O'Brien" in a StringReader just to get the Soundex "O435"
is overkill.  In other words, even if I had a 512 character String, I'm
still only producing a 4 character code ( unless I use Refined Soundex
).

The only reason, I bring that up is because I need to be able to Soundex
about 120,000 strings and populate a ternary search tree in a very
limited time ( 2-4 seconds ).  If I had to insert a "new StringReader()"
into this process I'd imagine that I'd be waiting much longer to create
this index.

For Soundex, Metaphone, Refined Soundex, I'm more inspired by the
java.security.MessageDigest class. 

Maybe we need two concepts:

A ChunkCodec - like Soundex, Metaphone, Refined Soundex, Message
digests....
And a StreamCodec - like Base64, Rot13, compression algorithms, sound
encoding...

--------
Tim O'Brien 
Transolutions, Inc.
W 847-574-2143
M 847-863-7045


> -----Original Message-----
> From: Jeff Varszegi [mailto:jvarszegi@yahoo.com] 
> Sent: Tuesday, November 19, 2002 4:02 AM
> To: Jakarta Commons Developers List
> Subject: [codec] Handling text encodings (one more thing, sorry)
> 
> 
> I also think that if there are going to be lots of codecs in 
> the project over time, all the classes for a particular area 
> should be in subpackages, like the Base64 codec currently is. 
>  That means that the Metaphone codec etc. should be moved 
> down into a subpackage, and the codec package should just 
> have the generic stuff.  
> 
> You're really getting this insomniac's seventy-five cents' 
> worth tonight. ;O)
> 
> -Jeff
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Web Hosting - Let the expert host your site 
http://webhosting.yahoo.com

--
To unsubscribe, e-mail:
<mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:commons-dev-help@jakarta.apache.org>




--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message