commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marc Pompl (JIRA)" <>
Subject [jira] Created: (CODEC-107) Enhance documentation for Language Encoders
Date Sat, 11 Dec 2010 00:27:01 GMT
Enhance documentation for Language Encoders

                 Key: CODEC-107
             Project: Commons Codec
          Issue Type: Improvement
    Affects Versions: 1.4
            Reporter: Marc Pompl
            Priority: Minor
             Fix For: 1.5

The current userguide ( just lists four Language
Encoders, but there are five at the moment. CODEC-106 implements a sixth one.
Would be a good idea, to complete documentation.

Additionally, I suggest to extent the wiki (
in order to show a simple performance measurement:


Metaphone encodings per sec: 32258
DoubleMetaphone encodings per sec: 31250
Soundex encodings per sec: 35714
RefinedSoundex encodings per sec: 34482
Caverphone encodings per sec: 5813
ColognePhonetic encodings per sec: 33333

So, Caverphone is much slower than any other algorithm. All others show off nearly the same

Checked with the following code:

  public void checkSpeed() throws Exception {
	  checkSpeedEncoding("Metaphone", "easgasg", "ESKS");
	  checkSpeedEncoding("DoubleMetaphone", "easgasg", "ASKS");
	  checkSpeedEncoding("Soundex", "easgasg", "E220");
	  checkSpeedEncoding("RefinedSoundex", "easgasg", "E034034");
	  checkSpeedEncoding("Caverphone", "Carlene", "KLN1111111");
	  checkSpeedEncoding("ColognePhonetic", "Schmitt", "862");
  private void checkSpeedEncoding(String encoder, String toBeEncoded, String estimated) throws
Exception {
	  long start = System.currentTimeMillis();
	  for ( int i=0; i<REPEATS; i++) {
		    assertAlgorithm(encoder, "false", toBeEncoded,
		            new String[] { estimated });
	  long duration = System.currentTimeMillis()-start;
	  System.out.println(encoder + " encodings per sec: "+(REPEATS/(duration/1000)));


This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message