uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bart Mellebeek <bart.melleb...@barcelonamedia.org>
Subject Re: Question on Capabilities of AE descriptor
Date Tue, 04 Nov 2008 19:14:07 GMT
Marshall Schor wrote:
> Bart Mellebeek wrote:
>
>   
>>> Hello,
>>>
>>> I have a question on the exact role of the output types in the
>>> Capabilities of an AE descriptor that I couldn't find in the
>>> documentation.
>>> A strange thing happens when I try to manipulate the descriptors of
>>> ex4/ of the tutorial in uimaj-examples. I am running
>>> ex4/MeetingDetectorTAE.xml with UIMA Document Analyzer. When I delete
>>> the output type RoomNumber in the Capabilities of
>>> ex2/RoomNumberAnnotator.xml and I run ex4/MeetingDetectorTAE.xml, the
>>> RoomNumber type is still visible in the analysis results.
>>>       
>>   
>>     
>
> I think this is because ex4/MeetingDetectorTAE.xml itself declares it
> outputs the RoomNumber type.  The DocumentAnalyzer is just a sample application
> that shows *selected* feature structure types - selected by looking at the
> output capabilities of the top-most analysis engine (in the case of an aggregate
> having "nested" components - such as you have in your example).  This means that
> the DocumentAnalyzer may not be showing all the feature structures in the CAS,
> but that doesn't mean that those feature structures are not there.
>
> See the code in uimaj-tools project: in src/main/org/apache/uima/tools/docanalyzer/DocumentAnalyzer.java,
lines 1185 - 1207.
>
>   
>>> Likewise, when I delete the output types TimeAnnot and DateAnnot in
>>> the capabilities of ex3/TutorialDateTime.xml, these types are still
>>> visible in the analysis results. 
>>>       
>>   
>>     
>
> I think for the same reason - the ex4/MeetingDetectorTAE.xml itself
> declares it outputs the the DateAnnot and TimeAnnot feature structures.
>
>
>   
>>> Only deleting the output type DateTimeAnnot in the capabilities of
>>> ex3/TutorialDateTime.xml seems to have an impact on the analysis results.
>>>       
>>   
>>     
> I ran the DocAnalyzer without modifying the examples, and the DateTimeAnnot does *not*
appear - this is the expected behavior because it is not listed in the DocumentAnalyzer's
output capabilities.  I think it will not appear, even if you don't delete the output type
DateTimeAnnot in the capabilities of ex3/TutorialDateTime.xml.  
>
>   
>>> Why is it that deleting some output types have no impact on analysis
>>> results, while deleting other output types do have an impact? Aren't
>>> all output types supposed to have this impact?
>>>       
>>   
>>     
> The UIMA framework makes the UIMA Metadata available to applications, but doesn't specify
what those application do with that data.  The DocumentAnalyzer is just a sample application
- built to show many of the capabilities of UIMA.  It took a particular design choice - to
show annotations in the CAS that were specified as output capabilities of the top-most component
(in the case of aggregates).  
>
> Hope that helps.
>
> -Marshall
>
>   
>> Any help appreciated.
>> Thanks,
>>
>> Bart
>>
>>     
>
>   
Thanks for your input.

I asked this question because I am trying to build a UIMA pipeline and 
the role of the AE capabilities in the intermediate annotators is not 
entirely clear to me. I was under the impression that for each annotator 
in the pipeline, the capabilities specify which are its input/output 
types. However, apparently once an annotation is inside the CAS, the 
specifications in the capabilities of the AEs do not seem to be relevant 
anymore.

For example, take the aggregate ex4/MeetingDetectorTAE.xml.  
MeetingAnnotator.java uses the types RoomNumber, DateAnnot and TimeAnnot 
to detect meetings. What surprises me is that deleting the output type 
RoomNumber in ex2/RoomNumberAnnotator.xml and deleting all the input 
types in ex4/MeetingAnnotator.xml (RoomNumber, DateAnnot and TimeAnnot) 
has no effect at all on the output: meetings are still detected 
correctly although these types have been deleted from the capabilities. 
Is this just because these annotations are already readily available in 
the CAS and if so, what exactly is the role of the capabilities and when 
should their types/features be specified?

Sorry if this is a basic question: I'm new to this.
Thank you for your time,

Bart

Mime
View raw message