uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: Usage of anchors
Date Thu, 29 Aug 2019 12:27:45 GMT
Hi,


the second option should be preferred at least until UIMA-3862 is
resolved with some additional indexing.

It is of course not so problematic if the literal matching condition is
not the starting anchor. However, it is still annoying that the rule
lements need to be designed according the dynamic partitioning of the
RutaBasis. This easily leads to problems is larger pipelines.


Best,


Peter


Am 29.08.2019 um 11:59 schrieb Nikolai Krot:
> Hi Peter,
>
> I have a question about this comment of yours:
>
> < ... but the matching using literal string expression is still really
> inefficient.
>
> What do you mean by "inefficient"? Do you mean it is slow? Say, if I want
> to use a literal in one hundred rules, what is a better strategy:
> 1) writing the string literally in every of these 100 rules; or
> 2) annotating the string (using MARKTABLE) and they using the annotation in
> these 100 rules?
>
> Best regards,
> Nikolai
>
> On Mon, Aug 26, 2019 at 2:27 PM Peter Klügl <peter.kluegl@averbis.com>
> wrote:
>
>> Hi,
>>
>>
>> Am 21.08.2019 um 15:47 schrieb Dominik Terweh:
>>> Hi Peter,
>>>
>>> Thanks a lot for the clarification. I was wondering about (10) too.
>>>
>>> Following your explanation I was wondering, Does it make sense to anchor
>> sequences, such as in (8) and is it "legal" to use multiple anchors in
>> hierarchical fashion?
>>> Like A @(B @C D)?
>> Yes, it is "legal", but you have to be careful. (There are not enough
>> unit tests for those rules)
>>
>>
>>> Also, is there a difference between the processing of sequences of
>> annotations or literals (given "A" is annotated as A and so on)?
>>> A @(B C D)
>>> Vs
>>> "A" @("B" "C" "D")
>>> Vs
>>> A @("B" C "D")
>>
>> It should not make a difference for the result, but the matching using
>> literal string epxression is still really inefficient.
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>>> Best
>>> Dominik
>>>
>>>
>>>
>>> Dominik Terweh
>>> Praktikant
>>>
>>> DROOMS
>>>
>>>
>>> Drooms GmbH
>>> Eschersheimer Landstraße 6
>>> 60322 Frankfurt, Germany
>>> www.drooms.com
>>>
>>> Phone:
>>> Fax:
>>> Mail: d.terweh@drooms.com
>>>
>>>
>>> Subscribe to the Drooms newsletter
>> https://drooms.com/en/newsletter?utm_source=newslettersignup&utm_medium=emailsignature
>>> Drooms GmbH; Sitz der Gesellschaft / Registered Office: Eschersheimer
>> Landstr. 6, D-60322 Frankfurt am Main; Geschaeftsfuehrung / Management
>> Board: Alexandre Grellier;
>>> Registergericht / Court of Registration: Amtsgericht Frankfurt am Main,
>> HRB 76454; Finanzamt / Tax Office: Finanzamt Frankfurt am Main, USt-IdNr.:
>> DE 224007190
>>> On 21.08.19, 12:10, "Peter Klügl" <peter.kluegl@averbis.com> wrote:
>>>
>>>     Hi,
>>>
>>>     Am 20.08.2019 um 16:09 schrieb Dominik Terweh:
>>>     >
>>>     > Dear All,
>>>     >
>>>     >
>>>     >
>>>     > I have some questions regarding processing times and anchors ("@").
>>>     >
>>>     >
>>>     >
>>>     > First of all, is it possible to define an anchor on a disjunction?
>>>     >
>>>     > What I tested was to have a simple rule (1) that should start on
>> the
>>>     > Element in the middle (2). Now this element had a variation (3)
>> but I
>>>     > could not use the anchor in that case anymore:
>>>     >
>>>     > 1) A    B   C;       // works
>>>     >
>>>     > 2) A   @B   C;       // works
>>>     >
>>>     > 3) A @(B|D) C;       // NOT WORKING
>>>     >
>>>     > Is this behaviour intended or simply not supported?
>>>     >
>>>     > [NOTE: NOT WORKING means eclipse does not complain, but the rule
>> never
>>>     > matches]
>>>     >
>>>     >
>>>     >
>>>     > The above led to some testing with a different setup(4), however,
>>>     > since disjunctions don't seem to work, this was also not valid.
>>>     >
>>>     > 4) A @((B C) | (D C));   // NOT WORKING
>>>     >
>>>
>>>     Anchors at disjunct rule elements are syntactically supported but do
>> not
>>>     work correctly. I will open a bug ticket.
>>>
>>>
>>>     >
>>>     >
>>>     > Is there a scenario where anchors are valid in and before brackets?
>>>     > From my observation I've seen that (5)-(10) are all working as
>>>     > expected and all start matching on B. But, do they differ in terms
>> of
>>>     > processing? I noticed slightly longer processing times in (5) and
>> ever
>>>     > so slightly in (6), but not very indicative. Could (5)-(10) differ
>> in
>>>     > processing time?
>>>     >
>>>     > 5)   A   @B C
>>>     >
>>>     > 6)  (A   @B C)
>>>     >
>>>     > 7) @(A   @B C)
>>>     >
>>>     > 8)   A  @(B C)
>>>     >
>>>     > 9)   A @(@B C)
>>>     >
>>>     > 10)  A  (@B C)
>>>     >
>>>
>>>     Yes since different combinations of methods are called, but I think
>>>     there should not be a big difference between (5)-(9).
>>>
>>>
>>>     >
>>>     >
>>>     > Since rule (10) works as expected, why does (11) work differently
>> and
>>>     > start on A but not on B and D? (This would be useful in a scenario
>>>     > where B and D combined appear less often than A)
>>>     >
>>>     > 11) A  ((@B C) | (@D C));   // starts matching on A
>>>     >
>>>     >
>>>     >
>>>     >
>>>     >
>>>
>>>     I have to check that. I think (10) start with A too.
>>>
>>>
>>>
>>>     Two comments for anchors and disjunct rule elements:
>>>
>>>     Anchors started as a manual option to optimize the rule execution
>> time
>>>     compared tot he automatic dynamic anchoring. However, the anchor can
>>>     considerably change the consequences of a rule. For me, the anchor is
>>>     more of an engineering option which also can be used to speed up the
>> rules.
>>>
>>>     Disjunct rule elements are not well supported and maintained in Ruta.
>>>     Their implementation is not efficient and they can lead to unintened
>>>     matches. Thus, their usage is not allowed in my team and I would not
>>>     recommend using them right now.
>>>
>>>
>>>     (I will try to find the time to improve the implementation)
>>>
>>>
>>>     Best,
>>>
>>>
>>>     Peter
>>>
>>>
>>>     > Thank you in advance for your answers,
>>>     >
>>>     > Best
>>>     >
>>>     > Dominik
>>>     >
>>>     > Dominik Terweh
>>>     > Praktikant
>>>     >
>>>     > *Drooms GmbH*
>>>     > Eschersheimer Landstraße 6
>>>     > 60322 Frankfurt, Germany
>>>     > www.drooms.com <http://www.drooms.com>
>>>     >
>>>     > Phone:
>>>     > Mail: d.terweh@drooms.com <mailto:d.terweh@drooms.com>
>>>     >
>>>     > <
>> https://drooms.com/en/newsletter?utm_source=newslettersignup&utm_medium=emailsignature
>>>     >
>>>     > *Drooms GmbH*; Sitz der Gesellschaft / Registered Office:
>>>     > Eschersheimer Landstr. 6, D-60322 Frankfurt am Main;
>> Geschäftsführung
>>>     > / Management Board: Alexandre Grellier;
>>>     > Registergericht / Court of Registration: Amtsgericht Frankfurt am
>>>     > Main, HRB 76454; Finanzamt / Tax Office: Finanzamt Frankfurt am
>> Main,
>>>     > USt-IdNr.: DE 224007190
>>>     >
>>>     --
>>>     Dr. Peter Klügl
>>>     R&D Text Mining/Machine Learning
>>>
>>>     Averbis GmbH
>>>     Salzstr. 15
>>>     79098 Freiburg
>>>     Germany
>>>
>>>     Fon: +49 761 708 394 0
>>>     Fax: +49 761 708 394 10
>>>     Email: peter.kluegl@averbis.com
>>>     Web: https://averbis.com
>>>
>>>     Headquarters: Freiburg im Breisgau
>>>     Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>     Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>
>>>
>>>
>> --
>> Dr. Peter Klügl
>> R&D Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Salzstr. 15
>> 79098 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: peter.kluegl@averbis.com
>> Web: https://averbis.com
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>
>>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó


Mime
View raw message