uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donatas Remeika <donatas.reme...@gmail.com>
Subject Re: New dictionary annotator
Date Fri, 02 Dec 2016 09:37:10 GMT
During the next week :)

Donatas

On Fri, Dec 2, 2016 at 11:32 AM Hugues de Mazancourt <hugues@mazancourt.com>
wrote:

> Cool !
> Any idea of how far that near future is ?
> ;-)
>
> — Hugues
>
>
>
> > Le 2 déc. 2016 à 10:26, Donatas Remeika <donatas.remeika@gmail.com> a
> écrit :
> >
> > Hi Hugues,
> >
> > Thanks for feedback. Indeed accent-insensitive matching is a needed
> > feature. Will implement it in a near future.
> >
> > Best regards,
> > Donatas Remeika
> >
> > On Fri, Dec 2, 2016 at 11:02 AM Hugues de Mazancourt <
> hugues@mazancourt.com>
> > wrote:
> >
> >> Thanks for this contribution.
> >>
> >> Do you have any plan to make the lookup accent-insensitive ? Or any
> >> knowledge of a component that would do the job ?
> >> I’m currently using ConceptMapper outside of Ruta and MARKTABLE from
> >> within Ruta but neither performs correctly on accents (btw,
> conceptMapper
> >> is *very* slow on resource loading, which can be a problem).
> >>
> >> My point is : I have lists containing elements like « événement » and I
> >> would like text like « EVENEMENT » or even « évènement » to match that
> >> list. Lowercasing texts is not a solution, as « é » is mapped to
> uppercase
> >> « É » in French locale, which has nothing to do with « e ». I guess you
> >> have the same problem with latvian.
> >>
> >> Best,
> >>
> >>
> >> Hugues de Mazancourt
> >> http://about.me/mazancourt
> >>
> >>
> >>
> >>
> >>> Le 30 nov. 2016 à 15:38, Donatas Remeika <donatas.remeika@gmail.com>
a
> >> écrit :
> >>>
> >>> Hi,
> >>>
> >>> Just wanted to let you know that we created a new (probably one more)
> >>> dictionary annotator.
> >>>
> >>> Reasons for creating it was:
> >>> - Quite often we used Ruta in our pipelines only because of its
> MARKTABLE
> >>> action which is able to set several features on annotation
> >>> - Sometimes dictionaries contain duplicate entries with different
> >> features
> >>> and we need to create annotations for each entry
> >>> - Possibility to use custom dictionary entries tokenizer (default is
> >>> whitespace tokenizer)
> >>>
> >>> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE.
> >> Big
> >>> thanks to their developers!
> >>>
> >>> Code with examples can be found
> >>> https://github.com/tokenmill/dictionary-annotator
> >>>
> >>> BTW, maybe someone knows Concept Mapper alternative, which is more
> >> uimaFIT
> >>> friendly?
> >>>
> >>> Best regards,
> >>> Donatas
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message