commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <brit...@apache.org>
Subject Re: [LANG] New class called StringAlgorithms?
Date Fri, 17 Jan 2014 12:11:18 GMT
2014/1/15 Oliver Heger <oliver.heger@oliver-heger.de>

>
>
> Am 15.01.2014 15:05, schrieb Benedikt Ritter:
> > 2014/1/15 Gary Gregory <garydgregory@gmail.com>
> >
> >>  On Wed, Jan 15, 2014 at 8:06 AM, Benedikt Ritter <britter@apache.org>
> >> wrote:
> >>
> >>> Hi Gary,
> >>>
> >>> 2014/1/15 Gary Gregory <garydgregory@gmail.com>
> >>>
> >>>> On Wed, Jan 15, 2014 at 7:00 AM, Benedikt Ritter <britter@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> we currently have StringUtils.getLevenshteinDistance. LANG-944 [1]
is
> >>>> about
> >>>>> introducing a new string algorithm called Jaro Winkler Distance
[2].
> >>>> Since
> >>>>> StringUtils already does a lot of things, I'm wondering if it may
> >> make
> >>>>> sense to introduce a new class that serves as a host for more string
> >>>>> algorithms to come. It would look something like:
> >>>>>
> >>>>> StringAlgorithms.levenshteinDistance(str1, str2);
> >>>>> StringAlgorithms.jaroWinklerDistance(str1, str2);
> >>>>>
> >>>>> We would deprecate StringUtils.getLevenshteinDistance and delegate
to
> >>> the
> >>>>> new class. It could be removed from StringUtils in the next major
> >>>> release.
> >>>>>
> >>>>
> >>>>> Thoughts?
> >>>>>
> >>>>
> >>>> Yuck!
> >>>>
> >>>> I'd rather have once class per algo which reminds me that [codec]
> might
> >>> be
> >>>> a better place for things like this that 'encode' strings into
> >> something
> >>>> else.
> >>>>
> >>>
> >>> Both methods return a double value modeling some kind of score. They do
> >> not
> >>> encode. Maybe StringAlgorithms is the wrong name? How About StringScore
> >> or
> >>> something like that?
> >>>
> >>
> >> Still wrong IMO and not OO. A single class will become another
> >> dumping-ground/kitchen-sink like StringUtils. I would not want to see
> one
> >> algo be a one method one liner impl and another algo be a complex 20
> method
> >> job. I guess we could organize algos using nested classes like
> >> StringFoo.BarAlgo but that's not ideal. All algo classes in a new pkg is
> >> another way to go.
> >>
> >
> > We already have o.a.c.lang3.text, maybe this would fit?
> >
> > What I want to avoid is something like:
> >
> > LevenshteinDistance algo = new LevenshteinDistance()
> > double dist = algo.getDistance(str1, str2);
> >
> > If those algorithms don't have a state, it doesn't make sense to force
> > creation of an object. I like to idea of internal classes.
>
> IIUC, both algorithms do the same thing - calculating the difference (or
> similarity) of two strings - using different methods.
>
> So another option would be to extract a common interface
> (StringDifferenceMetric?) and provide the algorithms as concrete
> implementations.
>

This is a possible, but very specific (= tied to distance measuring)
approach. I think it is a good idea to create very specific utilities
instead of generic ones like StringUtils, that can do a variety of things.


>
> A concrete use case could be a query engine which allows customizing its
> string matching algorithm.
>

Is this really a use case? It sounds very constructed to me. Have you ever
thought "I'd like to query on google, but I'd like suggestions to be
matched using Levenshtein Distance algorithm"?


>
> If you want to avoid instantiating algorithm classes with no state, we
> could have an enum with constants representing the available algorithms.
>

I still favor specific methods over an additional parameter.


>
> Oliver
>
> >
> >
> >>
> >> Gary
> >>
> >>
> >>>
> >>>
> >>>>
> >>>> Gary
> >>>>
> >>>>
> >>>>> Benedikt
> >>>>>
> >>>>> [1] https://issues.apache.org/jira/i#browse/LANG-944
> >>>>> [2] http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
> >>>>>
> >>>>> --
> >>>>> http://people.apache.org/~britter/
> >>>>> http://www.systemoutprintln.de/
> >>>>> http://twitter.com/BenediktRitter
> >>>>> http://github.com/britter
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> >>>> Java Persistence with Hibernate, Second Edition<
> >>>> http://www.manning.com/bauer3/>
> >>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> >>>> Spring Batch in Action <http://www.manning.com/templier/>
> >>>> Blog: http://garygregory.wordpress.com
> >>>> Home: http://garygregory.com/
> >>>> Tweet! http://twitter.com/GaryGregory
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> http://people.apache.org/~britter/
> >>> http://www.systemoutprintln.de/
> >>> http://twitter.com/BenediktRitter
> >>> http://github.com/britter
> >>>
> >>
> >>
> >>
> >> --
> >> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> >> Java Persistence with Hibernate, Second Edition<
> >> http://www.manning.com/bauer3/>
> >> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> >> Spring Batch in Action <http://www.manning.com/templier/>
> >> Blog: http://garygregory.wordpress.com
> >> Home: http://garygregory.com/
> >> Tweet! http://twitter.com/GaryGregory
> >>
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message