DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=28457>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=28457
[codec] Metaphone B not handling ending MB correctly
tobrien@discursive.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- Additional Comments From tobrien@discursive.com 2004-04-19 02:39 -------
This issue has been addressed, here is an excerpt from one of my emails to
commons-dev for the record:
I uncovered a potential bug in Metaphone. The code in question deals
with
> the encoding of 'B':
>
> // START CODE from Metaphone
>
> case 'B' :
> if ((n > 0) && !(n + 1 == wdsz) &&
> (local.charAt(n - 1) == 'M')) { // not MB at end of word
> code.append(symb);
> } else {
> code.append(symb);
> }
> mtsz++;
> break;
>
> // END CODE
>
> My understanding is that we should not encode a 'B' if a word ends in
> "MB".
> (Following:
http://aspell.sourceforge.net/metaphone/metaphone-kuhn.txt)So
> the Metaphone of "COMB" is "KM" not "TMB", and the Metaphone of "TOMB"
is
> "TM" not "TMB". I "refactored" this code a bit and came up with the
> following:
>
> case 'B' :
> if ( isPreviousChar(local, n, 'M') &&
> isLastChar(wdsz, n) ) {
> // B is silent if word ends in MB
> break;
> } else {
> code.append(symb);
> }
> break;
>
> Also, this code was (outright) copied from a C++ program, there was no
> need to keep track of the length of our StringBuffer in a variable
> named "mtsz".
> That's gone, and the only reason this was possible was great code
> coverage.
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
|