uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: fuzzy matching possible?
Date Tue, 07 May 2019 07:20:42 GMT
Hi,


at the end you need to check both, but you could maybe refactor the
checks in a new condition like (not tested):


CONDITION LemmaCT(ANNOTATION word, STRING check) = OR(word.lemma ==
check, word.ct == check);

w: Word{LemmaCT(w, "gearbeitet")};

... or with two string arguments for different checks for lemma and
covered text.


If you have many rules like this, I would prefer an additional analysis
engine because those rules may become less maintainable over time.


Best,


Peter



Am 04.05.2019 um 13:44 schrieb Nikolai Krot:
> Hi Peter,
>
> Thank you for the answer.
>
>> that mainly depends on the typesystem. Your rule could look something
> like:
>>
>> w:Word{OR(w.lemma == "arbeiten", w.ct == "gearbeitet")};
> I know of this syntax. My question is whether there is a shorter form to
> tell than whenever I need to match word text, the matching should check
> both lemma and ct fields. Think of a few dozen rules like this...
>
> Best regards,
> Nikolai
>>
>> Best,
>>
>>
>> Peter
>>
>> Am 03.05.2019 um 18:28 schrieb Nikolai Krot:
>>> Hi Peter,
>>>
>>> Thank you for your prompt reply.
>>>
>>> Speaking about pre-annotation with another engine. Say, I managed to
>>> annotate words of interest and additionally set an attribute, something
>>> like this
>>>
>>> ... <word lemma="arbeiten">gearbeitet</word>...
>>>
>>> Is there a simple way configure the object of matching in ruta rules so
>>> that the rule matches over actual text ("gearbeitet" in our case) or the
>>> value of attribute "lemma" ("arbeiten" in our case)?
>>> That is, match should return True if either of the fields evaluates to
> True.
>>> This would make some rules simpler.
>>>
>>> Best regards,
>>> Nikolai
>>>
>>> On Fri, May 3, 2019 at 2:03 PM Peter Klügl <peter.kluegl@averbis.com>
> wrote:
>>>> Hi,
>>>>
>>>>
>>>> there is/was support for a weighted edit distance in the trie lookup,
>>>> but that functionality was not maintained for many years.
>>>>
>>>> The dictionary lookup functionality in Ruta is overall very limited.
>>>> Normally, one uses an separate analysis engine with extended logic
>>>> (ConceptMapper?) for creating the annotations, which are then later
>>>> reused in rules.
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>> Am 03.05.2019 um 13:16 schrieb Nikolai Krot:
>>>>> Hi all,
>>>>>
>>>>> Is there a possibility to match a word somehow fuzzily in UIMA Ruta
>>>>> language? I am thinking how to overcome problems with typos and OCR
>>>>> mistakes... It is hardly possible to list all possibilities how a word
>>>>> could have been broken.
>>>>>
>>>>> Best regards,
>>>>> Nikolai Krot
>>>>>
>>>> --
>>>> Dr. Peter Klügl
>>>> R&D Text Mining/Machine Learning
>>>>
>>>> Averbis GmbH
>>>> Salzstr. 15
>>>> 79098 Freiburg
>>>> Germany
>>>>
>>>> Fon: +49 761 708 394 0
>>>> Fax: +49 761 708 394 10
>>>> Email: peter.kluegl@averbis.com
>>>> Web: https://averbis.com
>>>>
>>>> Headquarters: Freiburg im Breisgau
>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>>
>>>>
>> --
>> Peter Klügl
>> R&D Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Salzstr. 15
>> 79098 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: peter.kluegl@averbis.com
>> Web: https://averbis.com
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó


Mime
View raw message