commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 28457] - [codec] Metaphone B not handling ending MB correctly
Date Mon, 19 Apr 2004 02:39:46 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=28457>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=28457

[codec] Metaphone B not handling ending MB correctly

tobrien@discursive.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED



------- Additional Comments From tobrien@discursive.com  2004-04-19 02:39 -------
This issue has been addressed, here is an excerpt from one of my emails to
commons-dev for the record:

I uncovered a potential bug in Metaphone.  The code in question deals
with
> the encoding of 'B':
> 
> // START CODE from Metaphone
> 
> case 'B' :
>     if ((n > 0) && !(n + 1 == wdsz) &&
>         (local.charAt(n - 1) == 'M')) { // not MB at end of word
>         code.append(symb);
>     } else {
>         code.append(symb);
>     }
>     mtsz++;
>     break;
> 
> // END CODE
> 
> My understanding is that we should not encode a 'B' if a word ends in 
> "MB".
> (Following:
http://aspell.sourceforge.net/metaphone/metaphone-kuhn.txt)So
> the Metaphone of "COMB" is "KM" not "TMB", and the Metaphone of "TOMB"
is
> "TM" not "TMB".  I "refactored" this code a bit and came up with the
> following:
> 
> case 'B' :
>     if ( isPreviousChar(local, n, 'M') &&
>          isLastChar(wdsz, n) ) {
>         // B is silent if word ends in MB
> 	  break;
>     } else {
>         code.append(symb);
>     }
>     break;
> 
> Also, this code was (outright) copied from a C++ program, there was no 
> need to keep track of the length of our StringBuffer in a variable 
> named "mtsz".
> That's gone, and the only reason this was possible was great code 
> coverage.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message