uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hugues de Mazancourt <hug...@mazancourt.com>
Subject Re: New dictionary annotator
Date Fri, 02 Dec 2016 09:40:08 GMT
Great. Keep me informed, if you need a beta-tester !

— Hugues


> Le 2 déc. 2016 à 10:37, Donatas Remeika <donatas.remeika@gmail.com> a écrit
:
> 
> During the next week :)
> 
> Donatas
> 
> On Fri, Dec 2, 2016 at 11:32 AM Hugues de Mazancourt <hugues@mazancourt.com>
> wrote:
> 
>> Cool !
>> Any idea of how far that near future is ?
>> ;-)
>> 
>> — Hugues
>> 
>> 
>> 
>>> Le 2 déc. 2016 à 10:26, Donatas Remeika <donatas.remeika@gmail.com> a
>> écrit :
>>> 
>>> Hi Hugues,
>>> 
>>> Thanks for feedback. Indeed accent-insensitive matching is a needed
>>> feature. Will implement it in a near future.
>>> 
>>> Best regards,
>>> Donatas Remeika
>>> 
>>> On Fri, Dec 2, 2016 at 11:02 AM Hugues de Mazancourt <
>> hugues@mazancourt.com>
>>> wrote:
>>> 
>>>> Thanks for this contribution.
>>>> 
>>>> Do you have any plan to make the lookup accent-insensitive ? Or any
>>>> knowledge of a component that would do the job ?
>>>> I’m currently using ConceptMapper outside of Ruta and MARKTABLE from
>>>> within Ruta but neither performs correctly on accents (btw,
>> conceptMapper
>>>> is *very* slow on resource loading, which can be a problem).
>>>> 
>>>> My point is : I have lists containing elements like « événement » and
I
>>>> would like text like « EVENEMENT » or even « évènement » to match that
>>>> list. Lowercasing texts is not a solution, as « é » is mapped to
>> uppercase
>>>> « É » in French locale, which has nothing to do with « e ». I guess
you
>>>> have the same problem with latvian.
>>>> 
>>>> Best,
>>>> 
>>>> 
>>>> Hugues de Mazancourt
>>>> http://about.me/mazancourt
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> Le 30 nov. 2016 à 15:38, Donatas Remeika <donatas.remeika@gmail.com>
a
>>>> écrit :
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Just wanted to let you know that we created a new (probably one more)
>>>>> dictionary annotator.
>>>>> 
>>>>> Reasons for creating it was:
>>>>> - Quite often we used Ruta in our pipelines only because of its
>> MARKTABLE
>>>>> action which is able to set several features on annotation
>>>>> - Sometimes dictionaries contain duplicate entries with different
>>>> features
>>>>> and we need to create annotations for each entry
>>>>> - Possibility to use custom dictionary entries tokenizer (default is
>>>>> whitespace tokenizer)
>>>>> 
>>>>> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE.
>>>> Big
>>>>> thanks to their developers!
>>>>> 
>>>>> Code with examples can be found
>>>>> https://github.com/tokenmill/dictionary-annotator
>>>>> 
>>>>> BTW, maybe someone knows Concept Mapper alternative, which is more
>>>> uimaFIT
>>>>> friendly?
>>>>> 
>>>>> Best regards,
>>>>> Donatas
>>>> 
>>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message