commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <brit...@apache.org>
Subject Re: [PROPOSAL] Setup up new sandbox component "Commons Text" with git as primary vcs (Was: Re: [sandbox] New sandbox component)
Date Sun, 09 Nov 2014 11:32:40 GMT
INFRA issue is https://issues.apache.org/jira/browse/INFRA-8595

2014-11-07 16:17 GMT+01:00 Siegfried Goeschl <siegfried.goeschl@it20one.com>
:

> +1
>
> Cheers,
>
> Siegfried Goeschl
>
> > On 07 Nov 2014, at 09:47, Benedikt Ritter <britter@apache.org> wrote:
> >
> > Hi all,
> >
> > as disucssed, we'd like to create a new component which is focused on
> > algorithms for string/text processing.
> >
> > We (= Bruno and I) would like to create this new component with git as
> > primary vcs right away, which will make Commons Text the second Commons
> > component to use git. Please let me know if you have objections against
> > this. I'll open an INFRA ticket for the new git repo, this weekend.
> >
> > Thanks!
> > Benedikt
> >
> > 2014-10-27 12:57 GMT+01:00 Benedikt Ritter <britter@apache.org>:
> >
> >>
> >>
> >> 2014-10-27 12:32 GMT+01:00 Bruno P. Kinoshita <
> brunodepaulak@yahoo.com.br>
> >> :
> >>
> >>> Hi Benedikt!
> >>>> Just let me know if you need help with the bootstraping of the new
> >>> project.
> >>> Yes, please :)
> >>>
> >>
> >> I'll give folks some more time to share their thoughts about this and
> >> create the new project then.
> >>
> >>
> >>>
> >>>> Maybe we should even announce this on announce@. There my be other
> >>> projects interested in a library like this (for example Apache Tika
> [1])
> >>> Good idea! Should we drop a note there once the project has been
> created
> >>> or after we already have some code in there?
> >>>
> >>
> >> The latter seems appropriate to me.
> >>
> >>
> >>>
> >>> Thanks!Bruno
> >>>
> >>>
> >>>      From: Benedikt Ritter <britter@apache.org>
> >>> To: Commons Developers List <dev@commons.apache.org>; Bruno P.
> >>> Kinoshita <brunodepaulak@yahoo.com.br>
> >>> Sent: Monday, October 27, 2014 5:45 AM
> >>> Subject: Re: [sandbox] New sandbox component
> >>>
> >>> No objections from my site. I think this is a good idea. Just let me
> know
> >>> if you need help with the bootstraping of the new project. Maybe we
> should
> >>> even announce this on announce@. There my be other projects interested
> >>> in a library like this (for example Apache Tika [1])
> >>>
> >>> Benedikt
> >>>
> >>> [1] http://tika.apache.org/
> >>>
> >>>
> >>>
> >>> 2014-10-27 0:41 GMT+01:00 Bruno P. Kinoshita <
> brunodepaulak@yahoo.com.br
> >>>> :
> >>>
> >>> Hello all,
> >>> At the moment I'm working with data matching and record linkage, and
> had
> >>> to port some existing string comparison algorithms found in several
> open
> >>> source projects (fuzzy-search-tools, simmetrics, lingpipe, [lang],
> [codec]).
> >>> At that time I noticed LANG-591 [1], which suggests a more complex
> >>> levenshtein distance algorithm. There are several other algorithms too
> >>> (damerau-levenshtein, jaro, jaro-wrinkler, jaccard, bitap, q-gram,
> soundex,
> >>> metaphone). Instead of trying to put them all in, say, [lang], I'd
> like to
> >>> experiment with a new [text] component in the sandbox, if there are no
> >>> objections.
> >>> I will take a look at the existing code and its license, but most of
> >>> these algorithms have good Wiki pages with pseudo code available; as
> well
> >>> as academic papers.
> >>> Maybe this component could be useful for other projects like [lang],
> >>> Lucene, larsga/Duke, and Talend Open Studio. And even though my
> initial use
> >>> case for this would be string comparison, I think it could support
> other
> >>> use cases too.
> >>> Thoughts on this? Anyone else interested on such a component?
> >>> Thanks!Bruno
> >>> [1] https://issues.apache.org/jira/browse/LANG-591
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>>
> http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter
> >>>
> >>> --
> >>>
> >>> <
> http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter
> >
> >>>
> >>> <
> http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter
> >
> >>> http://people.apache.org/~britter/
> >>> http://www.systemoutprintln.de/
> >>> http://twitter.com/BenediktRitter
> >>> http://github.com/britter
> >>>
> >>
> >
> >
> > --
> > http://people.apache.org/~britter/
> > http://www.systemoutprintln.de/
> > http://twitter.com/BenediktRitter
> > http://github.com/britter
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message