uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hugues de Mazancourt <hug...@mazancourt.com>
Subject Re: New dictionary annotator
Date Fri, 02 Dec 2016 09:32:15 GMT
Cool !
Any idea of how far that near future is ?
;-)

— Hugues



> Le 2 déc. 2016 à 10:26, Donatas Remeika <donatas.remeika@gmail.com> a écrit
:
> 
> Hi Hugues,
> 
> Thanks for feedback. Indeed accent-insensitive matching is a needed
> feature. Will implement it in a near future.
> 
> Best regards,
> Donatas Remeika
> 
> On Fri, Dec 2, 2016 at 11:02 AM Hugues de Mazancourt <hugues@mazancourt.com>
> wrote:
> 
>> Thanks for this contribution.
>> 
>> Do you have any plan to make the lookup accent-insensitive ? Or any
>> knowledge of a component that would do the job ?
>> I’m currently using ConceptMapper outside of Ruta and MARKTABLE from
>> within Ruta but neither performs correctly on accents (btw, conceptMapper
>> is *very* slow on resource loading, which can be a problem).
>> 
>> My point is : I have lists containing elements like « événement » and I
>> would like text like « EVENEMENT » or even « évènement » to match that
>> list. Lowercasing texts is not a solution, as « é » is mapped to uppercase
>> « É » in French locale, which has nothing to do with « e ». I guess you
>> have the same problem with latvian.
>> 
>> Best,
>> 
>> 
>> Hugues de Mazancourt
>> http://about.me/mazancourt
>> 
>> 
>> 
>> 
>>> Le 30 nov. 2016 à 15:38, Donatas Remeika <donatas.remeika@gmail.com> a
>> écrit :
>>> 
>>> Hi,
>>> 
>>> Just wanted to let you know that we created a new (probably one more)
>>> dictionary annotator.
>>> 
>>> Reasons for creating it was:
>>> - Quite often we used Ruta in our pipelines only because of its MARKTABLE
>>> action which is able to set several features on annotation
>>> - Sometimes dictionaries contain duplicate entries with different
>> features
>>> and we need to create annotations for each entry
>>> - Possibility to use custom dictionary entries tokenizer (default is
>>> whitespace tokenizer)
>>> 
>>> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE.
>> Big
>>> thanks to their developers!
>>> 
>>> Code with examples can be found
>>> https://github.com/tokenmill/dictionary-annotator
>>> 
>>> BTW, maybe someone knows Concept Mapper alternative, which is more
>> uimaFIT
>>> friendly?
>>> 
>>> Best regards,
>>> Donatas
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message