Return-Path: X-Original-To: apmail-commons-dev-archive@www.apache.org Delivered-To: apmail-commons-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 25516101F3 for ; Wed, 15 Jan 2014 20:44:59 +0000 (UTC) Received: (qmail 82334 invoked by uid 500); 15 Jan 2014 20:44:54 -0000 Delivered-To: apmail-commons-dev-archive@commons.apache.org Received: (qmail 82171 invoked by uid 500); 15 Jan 2014 20:44:54 -0000 Mailing-List: contact dev-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Developers List" Delivered-To: mailing list dev@commons.apache.org Received: (qmail 82163 invoked by uid 99); 15 Jan 2014 20:44:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jan 2014 20:44:54 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [212.227.17.8] (HELO moutng.kundenserver.de) (212.227.17.8) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jan 2014 20:44:48 +0000 Received: from [192.168.178.22] (dslb-088-069-213-248.pools.arcor-ip.net [88.69.213.248]) by mrelayeu.kundenserver.de (node=mreu4) with ESMTP (Nemesis) id 0Lq2eS-1VPmjC15Mm-00dkHo; Wed, 15 Jan 2014 21:44:26 +0100 Message-ID: <52D6F340.5070305@oliver-heger.de> Date: Wed, 15 Jan 2014 21:44:48 +0100 From: Oliver Heger User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Commons Developers List Subject: Re: [LANG] New class called StringAlgorithms? References: In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:EZiHsK0UOSS/FH1+HRxs7RuFx+PaPlKw8haL6vY67D8 IEgtzvBi05HJ2buxca0/04zfYiWOntUvT+Mnzy/qxpiGM5K6DM g1ssB9FzejVSdiadHEUFYpS6Ty4cHMPjAlxyzJffqwZdvqoTAb 3tFG9wnuatqm0MLTjUeFaQYr956RC47gaIBjWGv7utgFJtxMy/ oCQWtzgIHaHgtHSdK8AuqWQgZcn9IjYT6yB0F12Wr2HGAbUoA6 6WoUx9GNckarXMBArTZNghuhDPvdhBfqgrDVOK73J5vcLa/RHU b2IneUf14QRTo30uNbFYvuJNYnxPrhqMOKUjdr2YcB9lBWK+Xn M0nmPwVLrAx5i4OhyrcXu41f6JR98ZTv/EkIYMTtX X-Virus-Checked: Checked by ClamAV on apache.org Am 15.01.2014 15:05, schrieb Benedikt Ritter: > 2014/1/15 Gary Gregory > >> On Wed, Jan 15, 2014 at 8:06 AM, Benedikt Ritter >> wrote: >> >>> Hi Gary, >>> >>> 2014/1/15 Gary Gregory >>> >>>> On Wed, Jan 15, 2014 at 7:00 AM, Benedikt Ritter >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> we currently have StringUtils.getLevenshteinDistance. LANG-944 [1] is >>>> about >>>>> introducing a new string algorithm called Jaro Winkler Distance [2]. >>>> Since >>>>> StringUtils already does a lot of things, I'm wondering if it may >> make >>>>> sense to introduce a new class that serves as a host for more string >>>>> algorithms to come. It would look something like: >>>>> >>>>> StringAlgorithms.levenshteinDistance(str1, str2); >>>>> StringAlgorithms.jaroWinklerDistance(str1, str2); >>>>> >>>>> We would deprecate StringUtils.getLevenshteinDistance and delegate to >>> the >>>>> new class. It could be removed from StringUtils in the next major >>>> release. >>>>> >>>> >>>>> Thoughts? >>>>> >>>> >>>> Yuck! >>>> >>>> I'd rather have once class per algo which reminds me that [codec] might >>> be >>>> a better place for things like this that 'encode' strings into >> something >>>> else. >>>> >>> >>> Both methods return a double value modeling some kind of score. They do >> not >>> encode. Maybe StringAlgorithms is the wrong name? How About StringScore >> or >>> something like that? >>> >> >> Still wrong IMO and not OO. A single class will become another >> dumping-ground/kitchen-sink like StringUtils. I would not want to see one >> algo be a one method one liner impl and another algo be a complex 20 method >> job. I guess we could organize algos using nested classes like >> StringFoo.BarAlgo but that's not ideal. All algo classes in a new pkg is >> another way to go. >> > > We already have o.a.c.lang3.text, maybe this would fit? > > What I want to avoid is something like: > > LevenshteinDistance algo = new LevenshteinDistance() > double dist = algo.getDistance(str1, str2); > > If those algorithms don't have a state, it doesn't make sense to force > creation of an object. I like to idea of internal classes. IIUC, both algorithms do the same thing - calculating the difference (or similarity) of two strings - using different methods. So another option would be to extract a common interface (StringDifferenceMetric?) and provide the algorithms as concrete implementations. A concrete use case could be a query engine which allows customizing its string matching algorithm. If you want to avoid instantiating algorithm classes with no state, we could have an enum with constants representing the available algorithms. Oliver > > >> >> Gary >> >> >>> >>> >>>> >>>> Gary >>>> >>>> >>>>> Benedikt >>>>> >>>>> [1] https://issues.apache.org/jira/i#browse/LANG-944 >>>>> [2] http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance >>>>> >>>>> -- >>>>> http://people.apache.org/~britter/ >>>>> http://www.systemoutprintln.de/ >>>>> http://twitter.com/BenediktRitter >>>>> http://github.com/britter >>>>> >>>> >>>> >>>> >>>> -- >>>> E-Mail: garydgregory@gmail.com | ggregory@apache.org >>>> Java Persistence with Hibernate, Second Edition< >>>> http://www.manning.com/bauer3/> >>>> JUnit in Action, Second Edition >>>> Spring Batch in Action >>>> Blog: http://garygregory.wordpress.com >>>> Home: http://garygregory.com/ >>>> Tweet! http://twitter.com/GaryGregory >>>> >>> >>> >>> >>> -- >>> http://people.apache.org/~britter/ >>> http://www.systemoutprintln.de/ >>> http://twitter.com/BenediktRitter >>> http://github.com/britter >>> >> >> >> >> -- >> E-Mail: garydgregory@gmail.com | ggregory@apache.org >> Java Persistence with Hibernate, Second Edition< >> http://www.manning.com/bauer3/> >> JUnit in Action, Second Edition >> Spring Batch in Action >> Blog: http://garygregory.wordpress.com >> Home: http://garygregory.com/ >> Tweet! http://twitter.com/GaryGregory >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org For additional commands, e-mail: dev-help@commons.apache.org