commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <ggreg...@seagullsw.com>
Subject RE: [codec] Soundex / Refined Soundex
Date Fri, 05 Dec 2003 13:40:30 GMT
Ok, great Matthew, thanks for using Bugzilla.

Unless someone else beats me to it, I'll take a look at this next week when
I get back... [no computers for the next 4 days!]

Thanks,
Gary

> -----Original Message-----
> From: Inger, Matthew [mailto:inger@Synygy.com]
> Sent: Friday, December 05, 2003 05:35
> To: 'Jakarta Commons Developers List'
> Subject: RE: [codec] Soundex / Refined Soundex
> 
> Not a problem.  I was just throwing it out there as a
> suggestion, and showing an example.  I'm more than willing to
> submit a patch for it. :)
> 
> I'll come up with a few test cases and add them as well.
> 
> I'll also create a bug in bugzilla, and attach the stuff
> there.
> 
> 
> -----Original Message-----
> From: Tim O'Brien [mailto:tobrien@discursive.com]
> Sent: Friday, December 05, 2003 8:27 AM
> To: Jakarta Commons Developers List
> Subject: Re: [codec] Soundex / Refined Soundex
> 
> 
> +1, Matthew.  Submit a patch for this, preferably on Bugzilla.
> 
> Tim
> 
> Gary Gregory wrote:
> 
> >Hello Matthew,
> >
> >We welcome your contribution; this would be a nice addition indeed. It
> would
> >make it easier for the person who will consider and/or integrate your
> >submission (me or another) if you submit all code in (1) CVS patch format
> >and more importantly (2) with Unit Tests.
> >
> >For more information on submitting patches please see:
> >
> >http://jakarta.apache.org/commons/patches.html
> >
> >Thank you,
> >Gary
> >
> >
> >
> >>-----Original Message-----
> >>From: Inger, Matthew [mailto:inger@Synygy.com]
> >>Sent: Thursday, December 04, 2003 12:12
> >>To: 'Jakarta Commons Developers List'
> >>Subject: RE: [codec] Soundex / Refined Soundex
> >>
> >>I have the code for this method if someone will commit it.
> >>Basically, the higher the difference, the better the match (which
> >>to me makes no sense, but that's the method's definition).
> >>
> >>public int difference(String a, String b)
> >>{
> >>   String soundexa = soundex(a);
> >>   String soundexb = soundex(b);
> >>   int alength = a.length();
> >>   int res = 0;
> >>   // return highest difference if the string lengths
> >>   // don't match
> >>   if (alength == b.length()) {
> >>       for (int i=0;i<alength;i++) {
> >>           if (soundexa.charAt(i) == soundexb.charAt(i)) {
> >>               res++;
> >>           }
> >>       }
> >>   }
> >>   return res;
> >>}
> >>
> >>For regular soundex, the difference would range from 0 (the worst)
> >>to 4 (the best).  For RefinedSoundex, it would be from 0 (the worst)
> >>to whathever the length of the soundex strings are, but the same
> >>method would work for both versions.
> >>
> >>here's the description from the SQLServer help:
> >>
> >>DIFFERENCE
> >>Returns the difference between the SOUNDEX values of two character
> >>expressions as an integer.
> >>
> >>Syntax
> >>DIFFERENCE ( character_expression , character_expression )
> >>
> >>Arguments
> >>character_expression
> >>
> >>Is an expression of type char or varchar.
> >>
> >>Return Types
> >>int
> >>
> >>Remarks
> >>The integer returned is the number of characters in the SOUNDEX values
> >>that
> >>are the same. The return value ranges from 0 through 4, with 4
> indicating
> >>the SOUNDEX values are identical.
> >>
> >>Examples
> >>In the first part of this example, the SOUNDEX values of two very
> similar
> >>strings are compared, and DIFFERENCE returns a value of 4. In the second
> >>part of this example, the SOUNDEX values for two very different strings
> >>are
> >>compared, and DIFFERENCE returns a value of 0.
> >>
> >>USE pubs
> >>GO
> >>-- Returns a DIFFERENCE value of 4, the least possible difference.
> >>SELECT SOUNDEX('Green'),
> >>  SOUNDEX('Greene'), DIFFERENCE('Green','Greene')
> >>GO
> >>-- Returns a DIFFERENCE value of 0, the highest possible difference.
> >>SELECT SOUNDEX('Blotchet-Halls'),
> >>  SOUNDEX('Greene'), DIFFERENCE('Blotchet-Halls', 'Greene')
> >>GO
> >>
> >>Here is the result set:
> >>
> >>----- ----- -----------
> >>G650  G650  4
> >>
> >>(1 row(s) affected)
> >>
> >>----- ----- -----------
> >>B432  G650  0
> >>
> >>(1 row(s) affected)
> >>
> >>
> >>
> >>-----Original Message-----
> >>From: Inger, Matthew [mailto:inger@Synygy.com]
> >>Sent: Thursday, December 04, 2003 2:53 PM
> >>To: 'Jakarta Commons Developers List'
> >>Subject: RE: [codec] Soundex / Refined Soundex
> >>
> >>
> >>Any thoughts on the "difference" method?
> >>
> >>
> >>-----Original Message-----
> >>From: Gary Gregory [mailto:ggregory@seagullsw.com]
> >>Sent: Thursday, December 04, 2003 12:18 PM
> >>To: 'Jakarta Commons Developers List'
> >>Subject: RE: [codec] Soundex / Refined Soundex
> >>
> >>
> >>Hello,
> >>
> >>Thank you for your interest in [codec].
> >>
> >>Soundex is, well, Soundex, a method to find word with similar phonemes.
> >>
> >>Refined Sounder, OTOH, is more geared towards spellchecking.
> >>
> >>For example:
> >>
> >>new Soundex().encode("testing") returns "T235"
> >>new RefinedSoundex().encode("testing") returns "T6036084"
> >>
> >>Gary
> >>
> >>
> >>
> >>>-----Original Message-----
> >>>From: Inger, Matthew [mailto:inger@Synygy.com]
> >>>Sent: Thursday, December 04, 2003 09:08
> >>>To: 'Jakarta Commons Developers List'
> >>>Subject: [codec] Soundex / Refined Soundex
> >>>
> >>>Can anyone tell me the difference between these two soundex
> >>>implementations?  Also, is there any planned support for a
> >>>difference algorithm for soundex (similar to the one provided
> >>>by SQLServer?)
> >>>
> >>>We are looking for a soundex implementation to use in our
> >>>software.  Thanks in advance for your help.
> >>>
> >>>
> >
> >
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message