commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Aymé <julien.a...@gmail.com>
Subject Re: [LANG] New class called StringAlgorithms?
Date Fri, 17 Jan 2014 12:26:18 GMT
More on Benedikt's idea:

<quote>
What I want to avoid is something like:

LevenshteinDistance algo = new LevenshteinDistance()
double dist = algo.getDistance(str1, str2);
<quote>

If the algorithm is stateless, we can provide a public static final
LevenshteinDistance INSTANCE.
In that case, the code would become:
double dist = LevenshteinDistance.INSTANCE.getDistance(str1, str2);

This would be OO, while allowing other algorithms to be added later
(these may or may not be stateless, and the code would go in their own classes).
WDYT?

Julien

2014/1/17 Benedikt Ritter <britter@apache.org>:
> 2014/1/15 Oliver Heger <oliver.heger@oliver-heger.de>
>
>>
>>
>> Am 15.01.2014 15:05, schrieb Benedikt Ritter:
>> > 2014/1/15 Gary Gregory <garydgregory@gmail.com>
>> >
>> >>  On Wed, Jan 15, 2014 at 8:06 AM, Benedikt Ritter <britter@apache.org>
>> >> wrote:
>> >>
>> >>> Hi Gary,
>> >>>
>> >>> 2014/1/15 Gary Gregory <garydgregory@gmail.com>
>> >>>
>> >>>> On Wed, Jan 15, 2014 at 7:00 AM, Benedikt Ritter <britter@apache.org>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> we currently have StringUtils.getLevenshteinDistance. LANG-944
[1] is
>> >>>> about
>> >>>>> introducing a new string algorithm called Jaro Winkler Distance
[2].
>> >>>> Since
>> >>>>> StringUtils already does a lot of things, I'm wondering if it
may
>> >> make
>> >>>>> sense to introduce a new class that serves as a host for more
string
>> >>>>> algorithms to come. It would look something like:
>> >>>>>
>> >>>>> StringAlgorithms.levenshteinDistance(str1, str2);
>> >>>>> StringAlgorithms.jaroWinklerDistance(str1, str2);
>> >>>>>
>> >>>>> We would deprecate StringUtils.getLevenshteinDistance and delegate
to
>> >>> the
>> >>>>> new class. It could be removed from StringUtils in the next
major
>> >>>> release.
>> >>>>>
>> >>>>
>> >>>>> Thoughts?
>> >>>>>
>> >>>>
>> >>>> Yuck!
>> >>>>
>> >>>> I'd rather have once class per algo which reminds me that [codec]
>> might
>> >>> be
>> >>>> a better place for things like this that 'encode' strings into
>> >> something
>> >>>> else.
>> >>>>
>> >>>
>> >>> Both methods return a double value modeling some kind of score. They
do
>> >> not
>> >>> encode. Maybe StringAlgorithms is the wrong name? How About StringScore
>> >> or
>> >>> something like that?
>> >>>
>> >>
>> >> Still wrong IMO and not OO. A single class will become another
>> >> dumping-ground/kitchen-sink like StringUtils. I would not want to see
>> one
>> >> algo be a one method one liner impl and another algo be a complex 20
>> method
>> >> job. I guess we could organize algos using nested classes like
>> >> StringFoo.BarAlgo but that's not ideal. All algo classes in a new pkg is
>> >> another way to go.
>> >>
>> >
>> > We already have o.a.c.lang3.text, maybe this would fit?
>> >
>> > What I want to avoid is something like:
>> >
>> > LevenshteinDistance algo = new LevenshteinDistance()
>> > double dist = algo.getDistance(str1, str2);
>> >
>> > If those algorithms don't have a state, it doesn't make sense to force
>> > creation of an object. I like to idea of internal classes.
>>
>> IIUC, both algorithms do the same thing - calculating the difference (or
>> similarity) of two strings - using different methods.
>>
>> So another option would be to extract a common interface
>> (StringDifferenceMetric?) and provide the algorithms as concrete
>> implementations.
>>
>
> This is a possible, but very specific (= tied to distance measuring)
> approach. I think it is a good idea to create very specific utilities
> instead of generic ones like StringUtils, that can do a variety of things.
>
>
>>
>> A concrete use case could be a query engine which allows customizing its
>> string matching algorithm.
>>
>
> Is this really a use case? It sounds very constructed to me. Have you ever
> thought "I'd like to query on google, but I'd like suggestions to be
> matched using Levenshtein Distance algorithm"?
>
>
>>
>> If you want to avoid instantiating algorithm classes with no state, we
>> could have an enum with constants representing the available algorithms.
>>
>
> I still favor specific methods over an additional parameter.
>
>
>>
>> Oliver
>>
>> >
>> >
>> >>
>> >> Gary
>> >>
>> >>
>> >>>
>> >>>
>> >>>>
>> >>>> Gary
>> >>>>
>> >>>>
>> >>>>> Benedikt
>> >>>>>
>> >>>>> [1] https://issues.apache.org/jira/i#browse/LANG-944
>> >>>>> [2] http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
>> >>>>>
>> >>>>> --
>> >>>>> http://people.apache.org/~britter/
>> >>>>> http://www.systemoutprintln.de/
>> >>>>> http://twitter.com/BenediktRitter
>> >>>>> http://github.com/britter
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>> >>>> Java Persistence with Hibernate, Second Edition<
>> >>>> http://www.manning.com/bauer3/>
>> >>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>> >>>> Spring Batch in Action <http://www.manning.com/templier/>
>> >>>> Blog: http://garygregory.wordpress.com
>> >>>> Home: http://garygregory.com/
>> >>>> Tweet! http://twitter.com/GaryGregory
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> http://people.apache.org/~britter/
>> >>> http://www.systemoutprintln.de/
>> >>> http://twitter.com/BenediktRitter
>> >>> http://github.com/britter
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>> >> Java Persistence with Hibernate, Second Edition<
>> >> http://www.manning.com/bauer3/>
>> >> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>> >> Spring Batch in Action <http://www.manning.com/templier/>
>> >> Blog: http://garygregory.wordpress.com
>> >> Home: http://garygregory.com/
>> >> Tweet! http://twitter.com/GaryGregory
>> >>
>> >
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message