uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolai Krot <tal...@gmail.com>
Subject Re: fuzzy matching possible?
Date Sat, 04 May 2019 11:44:40 GMT
Hi Peter,

Thank you for the answer.

> that mainly depends on the typesystem. Your rule could look something
like:
>
>
> w:Word{OR(w.lemma == "arbeiten", w.ct == "gearbeitet")};

I know of this syntax. My question is whether there is a shorter form to
tell than whenever I need to match word text, the matching should check
both lemma and ct fields. Think of a few dozen rules like this...

Best regards,
Nikolai
>
>
> Best,
>
>
> Peter
>
> Am 03.05.2019 um 18:28 schrieb Nikolai Krot:
> > Hi Peter,
> >
> > Thank you for your prompt reply.
> >
> > Speaking about pre-annotation with another engine. Say, I managed to
> > annotate words of interest and additionally set an attribute, something
> > like this
> >
> > ... <word lemma="arbeiten">gearbeitet</word>...
> >
> > Is there a simple way configure the object of matching in ruta rules so
> > that the rule matches over actual text ("gearbeitet" in our case) or the
> > value of attribute "lemma" ("arbeiten" in our case)?
> > That is, match should return True if either of the fields evaluates to
True.
> > This would make some rules simpler.
> >
> > Best regards,
> > Nikolai
> >
> > On Fri, May 3, 2019 at 2:03 PM Peter Klügl <peter.kluegl@averbis.com>
wrote:
> >
> >> Hi,
> >>
> >>
> >> there is/was support for a weighted edit distance in the trie lookup,
> >> but that functionality was not maintained for many years.
> >>
> >> The dictionary lookup functionality in Ruta is overall very limited.
> >> Normally, one uses an separate analysis engine with extended logic
> >> (ConceptMapper?) for creating the annotations, which are then later
> >> reused in rules.
> >>
> >>
> >> Best,
> >>
> >>
> >> Peter
> >>
> >> Am 03.05.2019 um 13:16 schrieb Nikolai Krot:
> >>> Hi all,
> >>>
> >>> Is there a possibility to match a word somehow fuzzily in UIMA Ruta
> >>> language? I am thinking how to overcome problems with typos and OCR
> >>> mistakes... It is hardly possible to list all possibilities how a word
> >>> could have been broken.
> >>>
> >>> Best regards,
> >>> Nikolai Krot
> >>>
> >> --
> >> Dr. Peter Klügl
> >> R&D Text Mining/Machine Learning
> >>
> >> Averbis GmbH
> >> Salzstr. 15
> >> 79098 Freiburg
> >> Germany
> >>
> >> Fon: +49 761 708 394 0
> >> Fax: +49 761 708 394 10
> >> Email: peter.kluegl@averbis.com
> >> Web: https://averbis.com
> >>
> >> Headquarters: Freiburg im Breisgau
> >> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> >> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
> >>
> >>
> --
> Peter Klügl
> R&D Text Mining/Machine Learning
>
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
>
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.kluegl@averbis.com
> Web: https://averbis.com
>
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message