commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <benerit...@gmail.com>
Subject Re: [PROPOSAL] Setup up new sandbox component "Commons Text" with git as primary vcs (Was: Re: [sandbox] New sandbox component)
Date Sun, 09 Nov 2014 12:12:56 GMT
2014-11-09 13:03 GMT+01:00 Gary Gregory <garydgregory@gmail.com>:

> I'm not sure this is correct. We could have a git based sandbox instead of
> svn.
>
> The Sandbox is one big pile, not individual repos. If commons text
> graduates, then it can be moved to its own repo.
>

This is not a good idea, since git is based on the idea of
branching/merging. Having several unrelated projects in the same git repo
leads to merge problems in area you haven't touched, since you can only
branch the whole repository and not individual sub folders. What problems
do you see? Too many inactive/dead git repos in the sandbox?

Benedikt


>
> Gary
>
> <div>-------- Original message --------</div><div>From: Benedikt Ritter
<
> britter@apache.org> </div><div>Date:11/09/2014  06:32  (GMT-05:00)
> </div><div>To: Commons Developers List <dev@commons.apache.org>
> </div><div>Subject: Re: [PROPOSAL] Setup up new sandbox component "Commons
> Text" with git as primary vcs (Was: Re: [sandbox] New sandbox component)
> </div><div>
> </div>INFRA issue is https://issues.apache.org/jira/browse/INFRA-8595
>
> 2014-11-07 16:17 GMT+01:00 Siegfried Goeschl <
> siegfried.goeschl@it20one.com>
> :
>
> > +1
> >
> > Cheers,
> >
> > Siegfried Goeschl
> >
> > > On 07 Nov 2014, at 09:47, Benedikt Ritter <britter@apache.org> wrote:
> > >
> > > Hi all,
> > >
> > > as disucssed, we'd like to create a new component which is focused on
> > > algorithms for string/text processing.
> > >
> > > We (= Bruno and I) would like to create this new component with git as
> > > primary vcs right away, which will make Commons Text the second Commons
> > > component to use git. Please let me know if you have objections against
> > > this. I'll open an INFRA ticket for the new git repo, this weekend.
> > >
> > > Thanks!
> > > Benedikt
> > >
> > > 2014-10-27 12:57 GMT+01:00 Benedikt Ritter <britter@apache.org>:
> > >
> > >>
> > >>
> > >> 2014-10-27 12:32 GMT+01:00 Bruno P. Kinoshita <
> > brunodepaulak@yahoo.com.br>
> > >> :
> > >>
> > >>> Hi Benedikt!
> > >>>> Just let me know if you need help with the bootstraping of the
new
> > >>> project.
> > >>> Yes, please :)
> > >>>
> > >>
> > >> I'll give folks some more time to share their thoughts about this and
> > >> create the new project then.
> > >>
> > >>
> > >>>
> > >>>> Maybe we should even announce this on announce@. There my be other
> > >>> projects interested in a library like this (for example Apache Tika
> > [1])
> > >>> Good idea! Should we drop a note there once the project has been
> > created
> > >>> or after we already have some code in there?
> > >>>
> > >>
> > >> The latter seems appropriate to me.
> > >>
> > >>
> > >>>
> > >>> Thanks!Bruno
> > >>>
> > >>>
> > >>>      From: Benedikt Ritter <britter@apache.org>
> > >>> To: Commons Developers List <dev@commons.apache.org>; Bruno P.
> > >>> Kinoshita <brunodepaulak@yahoo.com.br>
> > >>> Sent: Monday, October 27, 2014 5:45 AM
> > >>> Subject: Re: [sandbox] New sandbox component
> > >>>
> > >>> No objections from my site. I think this is a good idea. Just let me
> > know
> > >>> if you need help with the bootstraping of the new project. Maybe we
> > should
> > >>> even announce this on announce@. There my be other projects
> interested
> > >>> in a library like this (for example Apache Tika [1])
> > >>>
> > >>> Benedikt
> > >>>
> > >>> [1] http://tika.apache.org/
> > >>>
> > >>>
> > >>>
> > >>> 2014-10-27 0:41 GMT+01:00 Bruno P. Kinoshita <
> > brunodepaulak@yahoo.com.br
> > >>>> :
> > >>>
> > >>> Hello all,
> > >>> At the moment I'm working with data matching and record linkage, and
> > had
> > >>> to port some existing string comparison algorithms found in several
> > open
> > >>> source projects (fuzzy-search-tools, simmetrics, lingpipe, [lang],
> > [codec]).
> > >>> At that time I noticed LANG-591 [1], which suggests a more complex
> > >>> levenshtein distance algorithm. There are several other algorithms
> too
> > >>> (damerau-levenshtein, jaro, jaro-wrinkler, jaccard, bitap, q-gram,
> > soundex,
> > >>> metaphone). Instead of trying to put them all in, say, [lang], I'd
> > like to
> > >>> experiment with a new [text] component in the sandbox, if there are
> no
> > >>> objections.
> > >>> I will take a look at the existing code and its license, but most of
> > >>> these algorithms have good Wiki pages with pseudo code available; as
> > well
> > >>> as academic papers.
> > >>> Maybe this component could be useful for other projects like [lang],
> > >>> Lucene, larsga/Duke, and Talend Open Studio. And even though my
> > initial use
> > >>> case for this would be string comparison, I think it could support
> > other
> > >>> use cases too.
> > >>> Thoughts on this? Anyone else interested on such a component?
> > >>> Thanks!Bruno
> > >>> [1] https://issues.apache.org/jira/browse/LANG-591
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>>
> >
> http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter
> > >>>
> > >>> --
> > >>>
> > >>> <
> >
> http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter
> > >
> > >>>
> > >>> <
> >
> http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter
> > >
> > >>> http://people.apache.org/~britter/
> > >>> http://www.systemoutprintln.de/
> > >>> http://twitter.com/BenediktRitter
> > >>> http://github.com/britter
> > >>>
> > >>
> > >
> > >
> > > --
> > > http://people.apache.org/~britter/
> > > http://www.systemoutprintln.de/
> > > http://twitter.com/BenediktRitter
> > > http://github.com/britter
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message