uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donatas Remeika <donatas.reme...@gmail.com>
Subject Re: New dictionary annotator
Date Fri, 02 Dec 2016 09:26:40 GMT
Hi Hugues,

Thanks for feedback. Indeed accent-insensitive matching is a needed
feature. Will implement it in a near future.

Best regards,
Donatas Remeika

On Fri, Dec 2, 2016 at 11:02 AM Hugues de Mazancourt <hugues@mazancourt.com>
wrote:

> Thanks for this contribution.
>
> Do you have any plan to make the lookup accent-insensitive ? Or any
> knowledge of a component that would do the job ?
> I’m currently using ConceptMapper outside of Ruta and MARKTABLE from
> within Ruta but neither performs correctly on accents (btw, conceptMapper
> is *very* slow on resource loading, which can be a problem).
>
> My point is : I have lists containing elements like « événement » and I
> would like text like « EVENEMENT » or even « évènement » to match that
> list. Lowercasing texts is not a solution, as « é » is mapped to uppercase
> « É » in French locale, which has nothing to do with « e ». I guess you
> have the same problem with latvian.
>
> Best,
>
>
> Hugues de Mazancourt
> http://about.me/mazancourt
>
>
>
>
> > Le 30 nov. 2016 à 15:38, Donatas Remeika <donatas.remeika@gmail.com> a
> écrit :
> >
> > Hi,
> >
> > Just wanted to let you know that we created a new (probably one more)
> > dictionary annotator.
> >
> > Reasons for creating it was:
> > - Quite often we used Ruta in our pipelines only because of its MARKTABLE
> > action which is able to set several features on annotation
> > - Sometimes dictionaries contain duplicate entries with different
> features
> > and we need to create annotations for each entry
> > - Possibility to use custom dictionary entries tokenizer (default is
> > whitespace tokenizer)
> >
> > It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE.
> Big
> > thanks to their developers!
> >
> > Code with examples can be found
> > https://github.com/tokenmill/dictionary-annotator
> >
> > BTW, maybe someone knows Concept Mapper alternative, which is more
> uimaFIT
> > friendly?
> >
> > Best regards,
> > Donatas
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message