uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolai Krot <tal...@gmail.com>
Subject Re: fuzzy matching possible?
Date Fri, 03 May 2019 16:28:23 GMT
Hi Peter,

Thank you for your prompt reply.

Speaking about pre-annotation with another engine. Say, I managed to
annotate words of interest and additionally set an attribute, something
like this

... <word lemma="arbeiten">gearbeitet</word>...

Is there a simple way configure the object of matching in ruta rules so
that the rule matches over actual text ("gearbeitet" in our case) or the
value of attribute "lemma" ("arbeiten" in our case)?
That is, match should return True if either of the fields evaluates to True.
This would make some rules simpler.

Best regards,
Nikolai

On Fri, May 3, 2019 at 2:03 PM Peter Klügl <peter.kluegl@averbis.com> wrote:

> Hi,
>
>
> there is/was support for a weighted edit distance in the trie lookup,
> but that functionality was not maintained for many years.
>
> The dictionary lookup functionality in Ruta is overall very limited.
> Normally, one uses an separate analysis engine with extended logic
> (ConceptMapper?) for creating the annotations, which are then later
> reused in rules.
>
>
> Best,
>
>
> Peter
>
> Am 03.05.2019 um 13:16 schrieb Nikolai Krot:
> > Hi all,
> >
> > Is there a possibility to match a word somehow fuzzily in UIMA Ruta
> > language? I am thinking how to overcome problems with typos and OCR
> > mistakes... It is hardly possible to list all possibilities how a word
> > could have been broken.
> >
> > Best regards,
> > Nikolai Krot
> >
> --
> Dr. Peter Klügl
> R&D Text Mining/Machine Learning
>
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
>
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.kluegl@averbis.com
> Web: https://averbis.com
>
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message