uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raffaella Ventaglio <ventag...@celi.it>
Subject Re: drools for annotation sequences
Date Mon, 19 Nov 2012 14:06:32 GMT
On 11/19/2012 02:01 PM, Alexander Klenner wrote:
> Hi Yason, Roberto and others,
> we ran into the same problem after one day of using Drools. Using an artificial index
however doesn't work for us since we have many Annotations and a complex TypeSystem, we had
to retouch all our AEs to get this index working - not possible for us.
> What do you think about an ArrayList, where all Annotations of one Type are stored in
their order of appearance (e.g. sorted by Begin). Instead of only adding all Annotationen
classes to Drools we now also add this ArrayList, which we create in our Drools AE? Does such
an approach make sense? We can ask for their respective ordering by using indexOf(Object).
This would work in Drools but it looks quite time consuming (if you have 
a lot of annotations, you would have a linear scan of your ArrayList for 
each /indexOf/ call).

I would try one of this:

*1.* use a /Map/ instead of an /ArrayList/ with /key = annotationId/ (or 
whatever else can identify one of your annotations) and /value = 
position/ in the list
This is quite simple to create and position values can be retrieved more 

*2.* use a /wrapper bean/ around your annotation so you can write your 
drools rules without using "external" info (from a /list/ or /map/)
if you have a bean like   AnnotationWrapper --> { int position, 
Annotation annot }
then you can write something like

   $t1: AnnotationWrapper (annot.text = "Mr.", $pos:position)
   $t2: AnnotationWrapper (annot.ortho = "capitalized", position == 
($pos + 1))
   create NE($t1.annot.start, $t2.annot.end)
I think that solution *2* is more "drools friendly" (because it can use 
direct "match" on a bean property instead of calling methods on an 
/array/ or a /map/, so it should be more efficient for RETE algorithm).

Hope it helps,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message