commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary D. Gregory (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CODEC-125) Implement a Beider-Morse phonetic matching codec
Date Mon, 25 Jul 2011 15:46:09 GMT

    [ https://issues.apache.org/jira/browse/CODEC-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070559#comment-13070559
] 

Gary D. Gregory commented on CODEC-125:
---------------------------------------

For me it looks like this test:

java.lang.AssertionError: language predicted for name 'Renault' is wrong: [] should contain
'french'
	at org.junit.Assert.fail(Assert.java:91)
	at org.junit.Assert.assertTrue(Assert.java:43)
	at org.apache.commons.codec.language.bm.LanguageGuessingTest.testLanguageGuessing(LanguageGuessingTest.java:84)

Fails because this bm.Lang method:

    public Set<String> guessLanguages(String text)
    {
        text = text.toLowerCase(); // todo: locale?
//        System.out.println("Testing text: '" + text + "'");

        Set<String> langs = new HashSet<String>(languages.getLanguages());
        for(LangRule rule : rules)
        {
            if(rule.matches(text))
            {
//                System.out.println("Rule " + rule.pattern + " matches " + text);
                if(rule.acceptOnMatch) {
//                    System.out.println("Retaining " + rule.languages);
                    langs.retainAll(rule.languages);
                }
                else {
//                    System.out.println("Removing " + rule.languages);
                    langs.removeAll(rule.languages);
                }
//                System.out.println("Current languages: " + langs);
            }
            else
            {
//                System.out.println("Rule " + rule.pattern + " does not match " + text);
            }
        }

        return langs;
    }

Return an empty set. It first add, then removes values in the loop and the set finishes empty.


Could rule order be an issue. A difference in RE interpretation between Java 5 and 6? I am
on 6.

> Implement a Beider-Morse phonetic matching codec
> ------------------------------------------------
>
>                 Key: CODEC-125
>                 URL: https://issues.apache.org/jira/browse/CODEC-125
>             Project: Commons Codec
>          Issue Type: New Feature
>            Reporter: Matthew Pocock
>            Priority: Minor
>         Attachments: bm-gg.diff, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch,
bmpm.patch, bmpm.patch, bmpm.patch
>
>
> I have implemented Beider Morse Phonetic Matching as a codec against the commons-codec
svn trunk. I would like to contribute this to commons-codec.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message