uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kl├╝gl <peter.klu...@averbis.com>
Subject Re: question on Ruta Query View
Date Sun, 19 Jun 2016 12:03:29 GMT
Hi,


attachements are removed on this mailing list.


I would bet that some annotations are not visible to the rules, so they
are simply skipped -> query view reutrn no matches.


In Ruta, annotations are invisble if their begin or end are covered by
something invisible, that are all annoations of types that are filtered.
Most often, the annotations are missed because they start or and with a
space or line break.


You can trim annotation, e.g., with


RETAINTYPE(SPACE,BREAK);

tsCurrent{-> TRIM(SPACE,BREAK)};

RETAINTYPE;



You can use the query view for this use case. I have to mention that the
query view was build to serve as a tool during rule engineering: to get
a quick overview over the annotated documents. It does not scale with
the number of documents since there is not indexing across CASes and you
need to deserialze all CASes.

If it is fast enough, it is totally fine for counting annotations with
the query view.

You can also write a simple uimaFIT analysis engine and add it to the
pipeline or the the ruta script. The analysis engine counts the
annotation in process() and outputs the aggregates result in
collectionProcessingComplete() (or the overridden method with the
correct name). If you want to parallelize it, you need a different
solution with a resource or something.

Best,

Peter



Am 17.06.2016 um 21:21 schrieb Bonnie MacKellar:
> Hi
>
> I am trying to use Ruta Query View to get a view of all matches for a
> particular annotation type across a large set of .xmi files. However,
> I am noticing something strange about Ruta Query View - it doesnt't
> report lots of matches that are shown in the Annotation browser (and
> which I believe are correct matches). For example, a given annotation
> type tsCurrent has 4 matches in the file NCT0036712, but these matches
> do not appear at all in the list of results in Ruta Query View when I
> query for tsCurrent.  For some files, though, the results for all
> matches do show up, and for other files, only a partial set of matches
> are in the query results. I cannot understand why this is happening.
> Perhaps my query syntax is wrong?  I can only find the one example in
> the manual, which isn't much to go on. 
>
> I am attaching a screenshot showing the AnnotationBrowser on the top
> right in Eclipse, with all of the matches for tsCurrent, and the Ruta
> Query view on bottom, which does not contain those matches. I think it
> is easier to see the problem visually.
>
> Also,ultimately I am just trying to get a count of the number of times
> certain annotations are made across all of my files. Is there a better
> way to do that instead of Ruta Query View?  I can't find another way
> to total matches across lots of files.
>
> thanks,
> Bonnie MacKellar
>
> Inline image 1


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message