commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Heger <oliver.he...@oliver-heger.de>
Subject Re: [LANG] New class called StringAlgorithms?
Date Wed, 15 Jan 2014 20:44:48 GMT


Am 15.01.2014 15:05, schrieb Benedikt Ritter:
> 2014/1/15 Gary Gregory <garydgregory@gmail.com>
> 
>>  On Wed, Jan 15, 2014 at 8:06 AM, Benedikt Ritter <britter@apache.org>
>> wrote:
>>
>>> Hi Gary,
>>>
>>> 2014/1/15 Gary Gregory <garydgregory@gmail.com>
>>>
>>>> On Wed, Jan 15, 2014 at 7:00 AM, Benedikt Ritter <britter@apache.org>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> we currently have StringUtils.getLevenshteinDistance. LANG-944 [1] is
>>>> about
>>>>> introducing a new string algorithm called Jaro Winkler Distance [2].
>>>> Since
>>>>> StringUtils already does a lot of things, I'm wondering if it may
>> make
>>>>> sense to introduce a new class that serves as a host for more string
>>>>> algorithms to come. It would look something like:
>>>>>
>>>>> StringAlgorithms.levenshteinDistance(str1, str2);
>>>>> StringAlgorithms.jaroWinklerDistance(str1, str2);
>>>>>
>>>>> We would deprecate StringUtils.getLevenshteinDistance and delegate to
>>> the
>>>>> new class. It could be removed from StringUtils in the next major
>>>> release.
>>>>>
>>>>
>>>>> Thoughts?
>>>>>
>>>>
>>>> Yuck!
>>>>
>>>> I'd rather have once class per algo which reminds me that [codec] might
>>> be
>>>> a better place for things like this that 'encode' strings into
>> something
>>>> else.
>>>>
>>>
>>> Both methods return a double value modeling some kind of score. They do
>> not
>>> encode. Maybe StringAlgorithms is the wrong name? How About StringScore
>> or
>>> something like that?
>>>
>>
>> Still wrong IMO and not OO. A single class will become another
>> dumping-ground/kitchen-sink like StringUtils. I would not want to see one
>> algo be a one method one liner impl and another algo be a complex 20 method
>> job. I guess we could organize algos using nested classes like
>> StringFoo.BarAlgo but that's not ideal. All algo classes in a new pkg is
>> another way to go.
>>
> 
> We already have o.a.c.lang3.text, maybe this would fit?
> 
> What I want to avoid is something like:
> 
> LevenshteinDistance algo = new LevenshteinDistance()
> double dist = algo.getDistance(str1, str2);
> 
> If those algorithms don't have a state, it doesn't make sense to force
> creation of an object. I like to idea of internal classes.

IIUC, both algorithms do the same thing - calculating the difference (or
similarity) of two strings - using different methods.

So another option would be to extract a common interface
(StringDifferenceMetric?) and provide the algorithms as concrete
implementations.

A concrete use case could be a query engine which allows customizing its
string matching algorithm.

If you want to avoid instantiating algorithm classes with no state, we
could have an enum with constants representing the available algorithms.

Oliver

> 
> 
>>
>> Gary
>>
>>
>>>
>>>
>>>>
>>>> Gary
>>>>
>>>>
>>>>> Benedikt
>>>>>
>>>>> [1] https://issues.apache.org/jira/i#browse/LANG-944
>>>>> [2] http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
>>>>>
>>>>> --
>>>>> http://people.apache.org/~britter/
>>>>> http://www.systemoutprintln.de/
>>>>> http://twitter.com/BenediktRitter
>>>>> http://github.com/britter
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>>>> Java Persistence with Hibernate, Second Edition<
>>>> http://www.manning.com/bauer3/>
>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>> Blog: http://garygregory.wordpress.com
>>>> Home: http://garygregory.com/
>>>> Tweet! http://twitter.com/GaryGregory
>>>>
>>>
>>>
>>>
>>> --
>>> http://people.apache.org/~britter/
>>> http://www.systemoutprintln.de/
>>> http://twitter.com/BenediktRitter
>>> http://github.com/britter
>>>
>>
>>
>>
>> --
>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>> Java Persistence with Hibernate, Second Edition<
>> http://www.manning.com/bauer3/>
>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>> Spring Batch in Action <http://www.manning.com/templier/>
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>>
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message