uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolai Krot <tal...@gmail.com>
Subject Re: Usage of anchors
Date Thu, 29 Aug 2019 09:59:15 GMT
Hi Peter,

I have a question about this comment of yours:

< ... but the matching using literal string expression is still really
inefficient.

What do you mean by "inefficient"? Do you mean it is slow? Say, if I want
to use a literal in one hundred rules, what is a better strategy:
1) writing the string literally in every of these 100 rules; or
2) annotating the string (using MARKTABLE) and they using the annotation in
these 100 rules?

Best regards,
Nikolai

On Mon, Aug 26, 2019 at 2:27 PM Peter Klügl <peter.kluegl@averbis.com>
wrote:

> Hi,
>
>
> Am 21.08.2019 um 15:47 schrieb Dominik Terweh:
> > Hi Peter,
> >
> > Thanks a lot for the clarification. I was wondering about (10) too.
> >
> > Following your explanation I was wondering, Does it make sense to anchor
> sequences, such as in (8) and is it "legal" to use multiple anchors in
> hierarchical fashion?
> > Like A @(B @C D)?
>
> Yes, it is "legal", but you have to be careful. (There are not enough
> unit tests for those rules)
>
>
> >
> > Also, is there a difference between the processing of sequences of
> annotations or literals (given "A" is annotated as A and so on)?
> > A @(B C D)
> > Vs
> > "A" @("B" "C" "D")
> > Vs
> > A @("B" C "D")
>
>
> It should not make a difference for the result, but the matching using
> literal string epxression is still really inefficient.
>
>
> Best,
>
>
> Peter
>
>
> >
> > Best
> > Dominik
> >
> >
> >
> > Dominik Terweh
> > Praktikant
> >
> > DROOMS
> >
> >
> > Drooms GmbH
> > Eschersheimer Landstraße 6
> > 60322 Frankfurt, Germany
> > www.drooms.com
> >
> > Phone:
> > Fax:
> > Mail: d.terweh@drooms.com
> >
> >
> > Subscribe to the Drooms newsletter
> >>>>
> https://drooms.com/en/newsletter?utm_source=newslettersignup&utm_medium=emailsignature
> > Drooms GmbH; Sitz der Gesellschaft / Registered Office: Eschersheimer
> Landstr. 6, D-60322 Frankfurt am Main; Geschaeftsfuehrung / Management
> Board: Alexandre Grellier;
> > Registergericht / Court of Registration: Amtsgericht Frankfurt am Main,
> HRB 76454; Finanzamt / Tax Office: Finanzamt Frankfurt am Main, USt-IdNr.:
> DE 224007190
> >
> > On 21.08.19, 12:10, "Peter Klügl" <peter.kluegl@averbis.com> wrote:
> >
> >     Hi,
> >
> >     Am 20.08.2019 um 16:09 schrieb Dominik Terweh:
> >     >
> >     > Dear All,
> >     >
> >     >
> >     >
> >     > I have some questions regarding processing times and anchors ("@").
> >     >
> >     >
> >     >
> >     > First of all, is it possible to define an anchor on a disjunction?
> >     >
> >     > What I tested was to have a simple rule (1) that should start on
> the
> >     > Element in the middle (2). Now this element had a variation (3)
> but I
> >     > could not use the anchor in that case anymore:
> >     >
> >     > 1) A    B   C;       // works
> >     >
> >     > 2) A   @B   C;       // works
> >     >
> >     > 3) A @(B|D) C;       // NOT WORKING
> >     >
> >     > Is this behaviour intended or simply not supported?
> >     >
> >     > [NOTE: NOT WORKING means eclipse does not complain, but the rule
> never
> >     > matches]
> >     >
> >     >
> >     >
> >     > The above led to some testing with a different setup(4), however,
> >     > since disjunctions don't seem to work, this was also not valid.
> >     >
> >     > 4) A @((B C) | (D C));   // NOT WORKING
> >     >
> >
> >     Anchors at disjunct rule elements are syntactically supported but do
> not
> >     work correctly. I will open a bug ticket.
> >
> >
> >     >
> >     >
> >     > Is there a scenario where anchors are valid in and before brackets?
> >     > From my observation I've seen that (5)-(10) are all working as
> >     > expected and all start matching on B. But, do they differ in terms
> of
> >     > processing? I noticed slightly longer processing times in (5) and
> ever
> >     > so slightly in (6), but not very indicative. Could (5)-(10) differ
> in
> >     > processing time?
> >     >
> >     > 5)   A   @B C
> >     >
> >     > 6)  (A   @B C)
> >     >
> >     > 7) @(A   @B C)
> >     >
> >     > 8)   A  @(B C)
> >     >
> >     > 9)   A @(@B C)
> >     >
> >     > 10)  A  (@B C)
> >     >
> >
> >     Yes since different combinations of methods are called, but I think
> >     there should not be a big difference between (5)-(9).
> >
> >
> >     >
> >     >
> >     > Since rule (10) works as expected, why does (11) work differently
> and
> >     > start on A but not on B and D? (This would be useful in a scenario
> >     > where B and D combined appear less often than A)
> >     >
> >     > 11) A  ((@B C) | (@D C));   // starts matching on A
> >     >
> >     >
> >     >
> >     >
> >     >
> >
> >     I have to check that. I think (10) start with A too.
> >
> >
> >
> >     Two comments for anchors and disjunct rule elements:
> >
> >     Anchors started as a manual option to optimize the rule execution
> time
> >     compared tot he automatic dynamic anchoring. However, the anchor can
> >     considerably change the consequences of a rule. For me, the anchor is
> >     more of an engineering option which also can be used to speed up the
> rules.
> >
> >
> >     Disjunct rule elements are not well supported and maintained in Ruta.
> >     Their implementation is not efficient and they can lead to unintened
> >     matches. Thus, their usage is not allowed in my team and I would not
> >     recommend using them right now.
> >
> >
> >     (I will try to find the time to improve the implementation)
> >
> >
> >     Best,
> >
> >
> >     Peter
> >
> >
> >     > Thank you in advance for your answers,
> >     >
> >     > Best
> >     >
> >     > Dominik
> >     >
> >     > Dominik Terweh
> >     > Praktikant
> >     >
> >     > *Drooms GmbH*
> >     > Eschersheimer Landstraße 6
> >     > 60322 Frankfurt, Germany
> >     > www.drooms.com <http://www.drooms.com>
> >     >
> >     > Phone:
> >     > Mail: d.terweh@drooms.com <mailto:d.terweh@drooms.com>
> >     >
> >     > <
> https://drooms.com/en/newsletter?utm_source=newslettersignup&utm_medium=emailsignature
> >
> >     >
> >     > *Drooms GmbH*; Sitz der Gesellschaft / Registered Office:
> >     > Eschersheimer Landstr. 6, D-60322 Frankfurt am Main;
> Geschäftsführung
> >     > / Management Board: Alexandre Grellier;
> >     > Registergericht / Court of Registration: Amtsgericht Frankfurt am
> >     > Main, HRB 76454; Finanzamt / Tax Office: Finanzamt Frankfurt am
> Main,
> >     > USt-IdNr.: DE 224007190
> >     >
> >     --
> >     Dr. Peter Klügl
> >     R&D Text Mining/Machine Learning
> >
> >     Averbis GmbH
> >     Salzstr. 15
> >     79098 Freiburg
> >     Germany
> >
> >     Fon: +49 761 708 394 0
> >     Fax: +49 761 708 394 10
> >     Email: peter.kluegl@averbis.com
> >     Web: https://averbis.com
> >
> >     Headquarters: Freiburg im Breisgau
> >     Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> >     Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
> >
> >
> >
> --
> Dr. Peter Klügl
> R&D Text Mining/Machine Learning
>
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
>
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.kluegl@averbis.com
> Web: https://averbis.com
>
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message