uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: question on Ruta Query View
Date Thu, 23 Jun 2016 08:36:55 GMT
Sorry for the delayed response.

It all depends on the filtering setting of the analysis engine and the
script. You can create annotations which are normally not visible to
common ruta rules. The Annotation Browser just displays all, but that
does not mean that the rules can match on these annotations. The Query
View uses a default Ruta analysis engine with the default filtering
settings, which means that annotations starting or ending with
whitespaces/linebreaks and markup are not visible and will be skipped.
It is not yet possible to reconfigure the Query View analysis engine yet
(I think). As I mentioned before, the Query View does not list the
annotations of the type you query, but returns the rule matches of the
rule you query. Can you check whether the missing annotations start or
end with something invisible? Just in case the problem is caused by
something else...



Am 19.06.2016 um 15:15 schrieb Bonnie MacKellar:
> I am sorry, I am now really confused. I have a Ruta script which annotates
>  a bunch of text files, resulting in .xmi files which I assume contain the
> annotations. When I open an .xmi file in the Annotation Browser, it shows
> all of the annotations produced by my script, right? It certainly looks
> correct.  I have checked them pretty carefully.
> Since I must specify .xmi files for the query view as well, I was assuming
> it is also listing the annotations in those same files.
> Yes, I know I can use UimaFIT but since I have a lot of types, I am
> dreading the configuation task. I just wanted some quick totals, and had
> hoped I could do it in a few minutes with the query view. Why are
> annotations made to be invisible if they end with a line break? That caused
> me no end of grief when I was developing my script. It seems unexpected.
> thanks,
> Bonnie MacKellar
> On Sun, Jun 19, 2016 at 8:52 AM, Peter Klügl <peter.kluegl@averbis.com>
> wrote:
>> Hi,
>> the annotation browser just lists all annotations in the CAS, it is
>> completely independent of the ruta language and just an extension of the
>> CAS Editor. The query view applies rules on a CAS and lists the rule
>> matches. So the query view is much more powerful than the annotation
>> browser since it can use the complete expressiveness of the language.
>> However, that is also the reason why it is sensible to the visibility
>> concept.
>> Best,
>> Peter
>> Am 19.06.2016 um 14:39 schrieb Bonnie MacKellar:
>>> The idea that spaces are making the annotations invisble is totally
>>> plausible. But why does the AnnotationBrowser see them then? The
>>> annotations are there - they haven't been skipped- just the query view is
>>> not picking them up. What is different about Annotation Browser that
>> would
>>> make those annotations not visible?
>>> thanks,
>>> Bonnie MacKellar
>>> On Sun, Jun 19, 2016 at 8:03 AM, Peter Klügl <peter.kluegl@averbis.com>
>>> wrote:
>>>> Hi,
>>>> attachements are removed on this mailing list.
>>>> I would bet that some annotations are not visible to the rules, so they
>>>> are simply skipped -> query view reutrn no matches.
>>>> In Ruta, annotations are invisble if their begin or end are covered by
>>>> something invisible, that are all annoations of types that are filtered.
>>>> Most often, the annotations are missed because they start or and with a
>>>> space or line break.
>>>> You can trim annotation, e.g., with
>>>> tsCurrent{-> TRIM(SPACE,BREAK)};
>>>> You can use the query view for this use case. I have to mention that the
>>>> query view was build to serve as a tool during rule engineering: to get
>>>> a quick overview over the annotated documents. It does not scale with
>>>> the number of documents since there is not indexing across CASes and you
>>>> need to deserialze all CASes.
>>>> If it is fast enough, it is totally fine for counting annotations with
>>>> the query view.
>>>> You can also write a simple uimaFIT analysis engine and add it to the
>>>> pipeline or the the ruta script. The analysis engine counts the
>>>> annotation in process() and outputs the aggregates result in
>>>> collectionProcessingComplete() (or the overridden method with the
>>>> correct name). If you want to parallelize it, you need a different
>>>> solution with a resource or something.
>>>> Best,
>>>> Peter
>>>> Am 17.06.2016 um 21:21 schrieb Bonnie MacKellar:
>>>>> Hi
>>>>> I am trying to use Ruta Query View to get a view of all matches for a
>>>>> particular annotation type across a large set of .xmi files. However,
>>>>> I am noticing something strange about Ruta Query View - it doesnt't
>>>>> report lots of matches that are shown in the Annotation browser (and
>>>>> which I believe are correct matches). For example, a given annotation
>>>>> type tsCurrent has 4 matches in the file NCT0036712, but these matches
>>>>> do not appear at all in the list of results in Ruta Query View when I
>>>>> query for tsCurrent.  For some files, though, the results for all
>>>>> matches do show up, and for other files, only a partial set of matches
>>>>> are in the query results. I cannot understand why this is happening.
>>>>> Perhaps my query syntax is wrong?  I can only find the one example in
>>>>> the manual, which isn't much to go on.
>>>>> I am attaching a screenshot showing the AnnotationBrowser on the top
>>>>> right in Eclipse, with all of the matches for tsCurrent, and the Ruta
>>>>> Query view on bottom, which does not contain those matches. I think it
>>>>> is easier to see the problem visually.
>>>>> Also,ultimately I am just trying to get a count of the number of times
>>>>> certain annotations are made across all of my files. Is there a better
>>>>> way to do that instead of Ruta Query View?  I can't find another way
>>>>> to total matches across lots of files.
>>>>> thanks,
>>>>> Bonnie MacKellar
>>>>> Inline image 1

View raw message