commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Pocock (JIRA)" <>
Subject [jira] [Commented] (CODEC-125) Implement a Beider-Morse phonetic matching codec
Date Fri, 01 Jul 2011 14:58:28 GMT


Matthew Pocock commented on CODEC-125:

I have renamed the bmpm package to bm. Do you want me to move BeiderMoreseEncoder into the
bm package? I put it into the language package because that is where all the other encoders
are, and I presume having them in that package allows them to be automagically imported by
things like the lucene configuration files. However, I put all the other stuff in bm because
it is specific to the bmpm method and is worth having publicly visible as you can do some
custom things with it that are not reasonable to expose through the codec. It also has no
relevance to the other codecs so I didn't want to clutter up the primary package.

So, I've applied the patch on ubuntu to a clean checkout of commons-codec. This failed to
pass all tests because all empty files in the patch failed to generate empty files in the
source tree. I did not know that patch behaved like this. Anyway, I've put a comment in every
otherwise empty file and now on ubuntu the patch applies cleanly to commons-codec and results
in a project that builds without errors.

Then I've made a clean checkout of commons-codec on windows 7 and applied the revised patch
using TortoiseSvn. When I build this, I get errors. It looks like windows is mangling the
unicode text files during application of the patch. You said that you where seeing '?' characters
in the text files. There are no such characters in the original text or in the patch file,
so I think this is indicating that the text has got mangled during patch application. After
applying the patch on windows using tortoiseSvn, in lang.txt I see ? for each cyrillic, greek,
hebrew and arabic characters. In the original file on windows I see various symbols. When
I look at the patch file directly in windows, I see symbols. I've looked at lang.txt in the
TortoiseMerge tool, and regardless of what I set the default encoding to, the interesting
unicode chars are mangled to '?'.

I've run out of ideas about how to apply the patch on windows. What tool where you using to
apply the patch? Can you tell it that the patch file is UTF8?

> Implement a Beider-Morse phonetic matching codec
> ------------------------------------------------
>                 Key: CODEC-125
>                 URL:
>             Project: Commons Codec
>          Issue Type: New Feature
>            Reporter: Matthew Pocock
>            Priority: Minor
>         Attachments: bm-gg.diff, bmpm.patch, bmpm.patch
> I have implemented Beider Morse Phonetic Matching as a codec against the commons-codec
svn trunk. I would like to contribute this to commons-codec.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message