commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <ggreg...@seagullsw.com>
Subject RE: [codec] Soundex / Refined Soundex
Date Fri, 05 Dec 2003 00:05:29 GMT
Hello Matthew,

We welcome your contribution; this would be a nice addition indeed. It would
make it easier for the person who will consider and/or integrate your
submission (me or another) if you submit all code in (1) CVS patch format
and more importantly (2) with Unit Tests.

For more information on submitting patches please see:

http://jakarta.apache.org/commons/patches.html

Thank you,
Gary

> -----Original Message-----
> From: Inger, Matthew [mailto:inger@Synygy.com]
> Sent: Thursday, December 04, 2003 12:12
> To: 'Jakarta Commons Developers List'
> Subject: RE: [codec] Soundex / Refined Soundex
> 
> I have the code for this method if someone will commit it.
> Basically, the higher the difference, the better the match (which
> to me makes no sense, but that's the method's definition).
> 
> public int difference(String a, String b)
> {
>    String soundexa = soundex(a);
>    String soundexb = soundex(b);
>    int alength = a.length();
>    int res = 0;
>    // return highest difference if the string lengths
>    // don't match
>    if (alength == b.length()) {
>        for (int i=0;i<alength;i++) {
>            if (soundexa.charAt(i) == soundexb.charAt(i)) {
>                res++;
>            }
>        }
>    }
>    return res;
> }
> 
> For regular soundex, the difference would range from 0 (the worst)
> to 4 (the best).  For RefinedSoundex, it would be from 0 (the worst)
> to whathever the length of the soundex strings are, but the same
> method would work for both versions.
> 
> here's the description from the SQLServer help:
> 
> DIFFERENCE
> Returns the difference between the SOUNDEX values of two character
> expressions as an integer.
> 
> Syntax
> DIFFERENCE ( character_expression , character_expression )
> 
> Arguments
> character_expression
> 
> Is an expression of type char or varchar.
> 
> Return Types
> int
> 
> Remarks
> The integer returned is the number of characters in the SOUNDEX values
> that
> are the same. The return value ranges from 0 through 4, with 4 indicating
> the SOUNDEX values are identical.
> 
> Examples
> In the first part of this example, the SOUNDEX values of two very similar
> strings are compared, and DIFFERENCE returns a value of 4. In the second
> part of this example, the SOUNDEX values for two very different strings
> are
> compared, and DIFFERENCE returns a value of 0.
> 
> USE pubs
> GO
> -- Returns a DIFFERENCE value of 4, the least possible difference.
> SELECT SOUNDEX('Green'),
>   SOUNDEX('Greene'), DIFFERENCE('Green','Greene')
> GO
> -- Returns a DIFFERENCE value of 0, the highest possible difference.
> SELECT SOUNDEX('Blotchet-Halls'),
>   SOUNDEX('Greene'), DIFFERENCE('Blotchet-Halls', 'Greene')
> GO
> 
> Here is the result set:
> 
> ----- ----- -----------
> G650  G650  4
> 
> (1 row(s) affected)
> 
> ----- ----- -----------
> B432  G650  0
> 
> (1 row(s) affected)
> 
> 
> 
> -----Original Message-----
> From: Inger, Matthew [mailto:inger@Synygy.com]
> Sent: Thursday, December 04, 2003 2:53 PM
> To: 'Jakarta Commons Developers List'
> Subject: RE: [codec] Soundex / Refined Soundex
> 
> 
> Any thoughts on the "difference" method?
> 
> 
> -----Original Message-----
> From: Gary Gregory [mailto:ggregory@seagullsw.com]
> Sent: Thursday, December 04, 2003 12:18 PM
> To: 'Jakarta Commons Developers List'
> Subject: RE: [codec] Soundex / Refined Soundex
> 
> 
> Hello,
> 
> Thank you for your interest in [codec].
> 
> Soundex is, well, Soundex, a method to find word with similar phonemes.
> 
> Refined Sounder, OTOH, is more geared towards spellchecking.
> 
> For example:
> 
> new Soundex().encode("testing") returns "T235"
> new RefinedSoundex().encode("testing") returns "T6036084"
> 
> Gary
> 
> > -----Original Message-----
> > From: Inger, Matthew [mailto:inger@Synygy.com]
> > Sent: Thursday, December 04, 2003 09:08
> > To: 'Jakarta Commons Developers List'
> > Subject: [codec] Soundex / Refined Soundex
> >
> > Can anyone tell me the difference between these two soundex
> > implementations?  Also, is there any planned support for a
> > difference algorithm for soundex (similar to the one provided
> > by SQLServer?)
> >
> > We are looking for a soundex implementation to use in our
> > software.  Thanks in advance for your help.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message