commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tamas Kende (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CODEC-125) Implement a Beider-Morse phonetic matching codec
Date Wed, 27 Jul 2011 07:44:13 GMT

    [ https://issues.apache.org/jira/browse/CODEC-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071586#comment-13071586
] 

Tamas Kende commented on CODEC-125:
-----------------------------------

Hi all,

I found a performance issue (maybe it is acceptable, or there is no way to fix it), when I
tried to encode the given names: "Salamon Hirsch Sándor". 
At first I thought it went to an infinitive loop, but it seems it could finish after a while.
I wrote a small test (I know it is not a real test) to show the speed of the encode method.
I don't know what max length is acceptable, but it could hang with small (below 20 chars size)
strings. I give up to wait for the encode of the longest English surname: MacGhilleseatheanaich.

{code:title=Bar.java|borderStyle=solid}
@Test
	public void speedCheck() throws EncoderException {
		char[] chars = new char[] { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'o','u' };
		BeiderMorseEncoder bmpm = new BeiderMorseEncoder();
		bmpm.setNameType(NameType.GENERIC);
		bmpm.setRuleType(RuleType.APPROX);
		StringBuffer stringBuffer = new StringBuffer();
		Random rand = new Random();
		stringBuffer.append(chars[rand.nextInt(chars.length)]);
		long start;
		for (int i = 0; i < 20; i++) {
			start = System.currentTimeMillis();
			System.out.println("String to encode:" + stringBuffer.toString());
			bmpm.encode(stringBuffer.toString());
			stringBuffer.append(chars[rand.nextInt(chars.length)]);
			System.out.println("Elapsed time in ms:"
					+ (System.currentTimeMillis() - start));
		}
	}
{code}
And here is an example output (it seems the speed is highly depends on the diversion (maybe
it is unambiguous, but I am just an end-user)
{code}
...
String to encode:ouddudgeef
Elapsed time in ms:42
String to encode:ouddudgeefc
Elapsed time in ms:573
String to encode:ouddudgeefca
Elapsed time in ms:902
String to encode:ouddudgeefcao
Elapsed time in ms:921
String to encode:ouddudgeefcaoo
Elapsed time in ms:2927
String to encode:ouddudgeefcaooa
{code}

If there is no way to fix this, the javadoc should contain some information about how to speed
up the encoding (I did not try to change the NameType, RuleType or use a language specific
setting).

> Implement a Beider-Morse phonetic matching codec
> ------------------------------------------------
>
>                 Key: CODEC-125
>                 URL: https://issues.apache.org/jira/browse/CODEC-125
>             Project: Commons Codec
>          Issue Type: New Feature
>            Reporter: Matthew Pocock
>            Priority: Minor
>         Attachments: bm-gg.diff, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch,
bmpm.patch, bmpm.patch, bmpm.patch
>
>
> I have implemented Beider Morse Phonetic Matching as a codec against the commons-codec
svn trunk. I would like to contribute this to commons-codec.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message