commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <...@spaceroots.org>
Subject Re: [PROPOSAL] Setup up new sandbox component "Commons Text" with git as primary vcs (Was: Re: [sandbox] New sandbox component)
Date Fri, 07 Nov 2014 13:01:56 GMT
Le 07/11/2014 12:55, Gary Gregory a écrit :
> Go for it!

+1

Luc

> 
> Gary
> 
> <div>-------- Original message --------</div><div>From: Benedikt Ritter
<britter@apache.org> </div><div>Date:11/07/2014  03:47  (GMT-05:00) </div><div>To:
Commons Developers List <dev@commons.apache.org> </div><div>Subject: [PROPOSAL]
Setup up new sandbox component "Commons Text" with git as primary vcs (Was: Re: [sandbox]
New sandbox component) </div><div>
> </div>Hi all,
> 
> as disucssed, we'd like to create a new component which is focused on
> algorithms for string/text processing.
> 
> We (= Bruno and I) would like to create this new component with git as
> primary vcs right away, which will make Commons Text the second Commons
> component to use git. Please let me know if you have objections against
> this. I'll open an INFRA ticket for the new git repo, this weekend.
> 
> Thanks!
> Benedikt
> 
> 2014-10-27 12:57 GMT+01:00 Benedikt Ritter <britter@apache.org>:
> 
>>
>>
>> 2014-10-27 12:32 GMT+01:00 Bruno P. Kinoshita <brunodepaulak@yahoo.com.br>
>> :
>>
>>> Hi Benedikt!
>>>> Just let me know if you need help with the bootstraping of the new
>>> project.
>>> Yes, please :)
>>>
>>
>> I'll give folks some more time to share their thoughts about this and
>> create the new project then.
>>
>>
>>>
>>>> Maybe we should even announce this on announce@. There my be other
>>> projects interested in a library like this (for example Apache Tika [1])
>>> Good idea! Should we drop a note there once the project has been created
>>> or after we already have some code in there?
>>>
>>
>> The latter seems appropriate to me.
>>
>>
>>>
>>>  Thanks!Bruno
>>>
>>>
>>>       From: Benedikt Ritter <britter@apache.org>
>>>  To: Commons Developers List <dev@commons.apache.org>; Bruno P.
>>> Kinoshita <brunodepaulak@yahoo.com.br>
>>>  Sent: Monday, October 27, 2014 5:45 AM
>>>  Subject: Re: [sandbox] New sandbox component
>>>
>>> No objections from my site. I think this is a good idea. Just let me know
>>> if you need help with the bootstraping of the new project. Maybe we should
>>> even announce this on announce@. There my be other projects interested
>>> in a library like this (for example Apache Tika [1])
>>>
>>> Benedikt
>>>
>>> [1] http://tika.apache.org/
>>>
>>>
>>>
>>> 2014-10-27 0:41 GMT+01:00 Bruno P. Kinoshita <brunodepaulak@yahoo.com.br
>>>> :
>>>
>>> Hello all,
>>> At the moment I'm working with data matching and record linkage, and had
>>> to port some existing string comparison algorithms found in several open
>>> source projects (fuzzy-search-tools, simmetrics, lingpipe, [lang], [codec]).
>>> At that time I noticed LANG-591 [1], which suggests a more complex
>>> levenshtein distance algorithm. There are several other algorithms too
>>> (damerau-levenshtein, jaro, jaro-wrinkler, jaccard, bitap, q-gram, soundex,
>>> metaphone). Instead of trying to put them all in, say, [lang], I'd like to
>>> experiment with a new [text] component in the sandbox, if there are no
>>> objections.
>>> I will take a look at the existing code and its license, but most of
>>> these algorithms have good Wiki pages with pseudo code available; as well
>>> as academic papers.
>>> Maybe this component could be useful for other projects like [lang],
>>> Lucene, larsga/Duke, and Talend Open Studio. And even though my initial use
>>> case for this would be string comparison, I think it could support other
>>> use cases too.
>>> Thoughts on this? Anyone else interested on such a component?
>>> Thanks!Bruno
>>> [1] https://issues.apache.org/jira/browse/LANG-591
>>>
>>>
>>>
>>> --
>>>
>>> http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter
>>>
>>> --
>>>
>>> <http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter>
>>>
>>> <http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter>
>>> http://people.apache.org/~britter/
>>> http://www.systemoutprintln.de/
>>> http://twitter.com/BenediktRitter
>>> http://github.com/britter
>>>
>>
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message