uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Klenner <alexander.garvin.klen...@scai.fraunhofer.de>
Subject Re: drools for annotation sequences
Date Mon, 19 Nov 2012 13:01:31 GMT
Hi Yason, Roberto and others,

we ran into the same problem after one day of using Drools. Using an artificial index however
doesn't work for us since we have many Annotations and a complex TypeSystem, we had to retouch
all our AEs to get this index working - not possible for us.

What do you think about an ArrayList, where all Annotations of one Type are stored in their
order of appearance (e.g. sorted by Begin). Instead of only adding all Annotationen classes
to Drools we now also add this ArrayList, which we create in our Drools AE? Does such an approach
make sense? We can ask for their respective ordering by using indexOf(Object).

Best regards,

Alex

 


--
Alexander G. Klenner
Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
Schloss Birlinghoven, D-53754 Sankt Augustin
Tel.: +49 - 2241 - 14 - 2736
E-mail: alexander.garvin.klenner@scai.fraunhofer.de
Internet: http://www.scai.fraunhofer.de


----- Original Message -----
From: "Roberto Franchini" <franchini@celi.it>
To: user@uima.apache.org, "Yasen Kiprov" <yasenkiprov@yahoo.com>
Sent: Monday, November 19, 2012 12:25:02 PM
Subject: Re: drools for annotation sequences

On Mon, Nov 19, 2012 at 10:06 AM, Yasen Kiprov <yasenkiprov@yahoo.com> wrote:
> Hello,
>
[cut]

>
> when
> Token with text == Mr. and number i
> Token with capital letter and number i + 1
> ...
>
> But it doesn't look right.
>
> Does anyone have any idea how such patterns can be modeled with Drools?

We use the same way.
So every annotation emitted before Drools grammars and from grammars
too has 2 additional features posBegin and posEnd, where pos stands
for "position".
So a single token has posBegin equals to posEnd, while a sentence has
posEnd greater than posBegin.

So, in the "when" part of rules, you can match sequence of tokens (pseudo code):

Token $t1 whit text == Mr. and $posBegin=posBegin
Token $t2 with ortho=capitalized and posBegin== $posBegin+1

And in the "then" emit new annotations
then
emit NE($1.sgetStart,$t2.getEnd)

And , yes, we adapted our type system to be used this way.
The better this is to write a little Drools DSL that encapsulate
"when" and "then" frequently used patterns.

Hope this help,
cheers,
FRANK


--
Roberto Franchini
The impossible is inevitable.
http://www.celi.it                     http://www.blogmeter.it
http://github.com/celi-uim       http://github.com/robfrank
Tel +39.011.562.71.15
jabber:ro.franchini@gmail.com skype:ro.franchini tw:@robfrankie

Mime
View raw message