uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hugues de Mazancourt <hug...@mazancourt.com>
Subject Re: New dictionary annotator
Date Fri, 02 Dec 2016 09:02:44 GMT
Thanks for this contribution.

Do you have any plan to make the lookup accent-insensitive ? Or any knowledge of a component
that would do the job ?
I’m currently using ConceptMapper outside of Ruta and MARKTABLE from within Ruta but neither
performs correctly on accents (btw, conceptMapper is *very* slow on resource loading, which
can be a problem).

My point is : I have lists containing elements like « événement » and I would like text
like « EVENEMENT » or even « évènement » to match that list. Lowercasing texts is not
a solution, as « é » is mapped to uppercase « É » in French locale, which has nothing
to do with « e ». I guess you have the same problem with latvian.


Hugues de Mazancourt

> Le 30 nov. 2016 à 15:38, Donatas Remeika <donatas.remeika@gmail.com> a écrit
> Hi,
> Just wanted to let you know that we created a new (probably one more)
> dictionary annotator.
> Reasons for creating it was:
> - Quite often we used Ruta in our pipelines only because of its MARKTABLE
> action which is able to set several features on annotation
> - Sometimes dictionaries contain duplicate entries with different features
> and we need to create annotations for each entry
> - Possibility to use custom dictionary entries tokenizer (default is
> whitespace tokenizer)
> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE. Big
> thanks to their developers!
> Code with examples can be found
> https://github.com/tokenmill/dictionary-annotator
> BTW, maybe someone knows Concept Mapper alternative, which is more uimaFIT
> friendly?
> Best regards,
> Donatas

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message