Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 70661200C56 for ; Thu, 30 Mar 2017 18:29:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 6ED76160B7E; Thu, 30 Mar 2017 16:29:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B5341160B8B for ; Thu, 30 Mar 2017 18:29:45 +0200 (CEST) Received: (qmail 28104 invoked by uid 500); 30 Mar 2017 16:29:44 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 28082 invoked by uid 99); 30 Mar 2017 16:29:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Mar 2017 16:29:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 5ABD1188A0D for ; Thu, 30 Mar 2017 16:29:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Xf4FOsviP68D for ; Thu, 30 Mar 2017 16:29:42 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 5323B5FBCA for ; Thu, 30 Mar 2017 16:29:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id D9376E0526 for ; Thu, 30 Mar 2017 16:29:41 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 9264A21DD6 for ; Thu, 30 Mar 2017 16:29:41 +0000 (UTC) Date: Thu, 30 Mar 2017 16:29:41 +0000 (UTC) From: "Yossi Tamari (JIRA)" To: issues@commons.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CODEC-199) Bug in HW rule in Soundex MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 30 Mar 2017 16:29:46 -0000 [ https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949349#comment-15949349 ] Yossi Tamari commented on CODEC-199: ------------------------------------ The fact that the HW rule is forced IS a bug, in my mind, and one of the things I was trying to fix. Read [https://en.wikipedia.org/wiki/Soundex]. While the class pretends to implement Soundex, it really implemented American Soundex. I have no problem with that, but then we should change the class name. I was under the impression that the constant (and the parameterized constructors) were there in order to be able to say that this is a generic Soundex implementation, with a default behavior of American Soundex (which is fine, since this is what most people want). But this was not the case before my second patch - the HW rule is part of American Soundex only, but there was no way to disable it, or apply it to other letters. The original default was wrong, and changing it is a feature. When somebody passes "01230120022455012623010202" intentionally, the HW rule should be disabled. If you want to protect from the accidental case, we can change the name of the constants, so the caching will not work. If you think this change is too big for 1.11 and should only happen in 2.0, that is a fair argument, though I personally don't think so at this moment (I am not sure what the rules for behavior change on minor release are here). > Bug in HW rule in Soundex > ------------------------- > > Key: CODEC-199 > URL: https://issues.apache.org/jira/browse/CODEC-199 > Project: Commons Codec > Issue Type: Bug > Affects Versions: 1.10 > Reporter: Yossi Tamari > Fix For: 1.11 > > Attachments: better.patch, soundex.patch > > > The Soundex algorithm says that if two characters that map to the same code are separated by H or W, the second one is not encoded. > However, in the implementation (in Soundex.getMappingCode() line 191), a character that is preceded by two characters that are either H or W, is not encoded, regardless of what the last consonant was. > Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex -- This message was sent by Atlassian JIRA (v6.3.15#6346)