uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: Question on Capabilities of AE descriptor
Date Thu, 06 Nov 2008 03:18:22 GMT


Bart Mellebeek wrote:
> Marshall Schor wrote:
>> Bart Mellebeek wrote:
>>
>>  
>>>> Hello,
>>>>
>>>> I have a question on the exact role of the output types in the
>>>> Capabilities of an AE descriptor that I couldn't find in the
>>>> documentation.
>>>> A strange thing happens when I try to manipulate the descriptors of
>>>> ex4/ of the tutorial in uimaj-examples. I am running
>>>> ex4/MeetingDetectorTAE.xml with UIMA Document Analyzer. When I delete
>>>> the output type RoomNumber in the Capabilities of
>>>> ex2/RoomNumberAnnotator.xml and I run ex4/MeetingDetectorTAE.xml, the
>>>> RoomNumber type is still visible in the analysis results.
>>>>       
>>>       
>>
>> I think this is because ex4/MeetingDetectorTAE.xml itself declares it
>> outputs the RoomNumber type.  The DocumentAnalyzer is just a sample
>> application
>> that shows *selected* feature structure types - selected by looking
>> at the
>> output capabilities of the top-most analysis engine (in the case of
>> an aggregate
>> having "nested" components - such as you have in your example).  This
>> means that
>> the DocumentAnalyzer may not be showing all the feature structures in
>> the CAS,
>> but that doesn't mean that those feature structures are not there.
>>
>> See the code in uimaj-tools project: in
>> src/main/org/apache/uima/tools/docanalyzer/DocumentAnalyzer.java,
>> lines 1185 - 1207.
>>
>>  
>>>> Likewise, when I delete the output types TimeAnnot and DateAnnot in
>>>> the capabilities of ex3/TutorialDateTime.xml, these types are still
>>>> visible in the analysis results.       
>>>       
>>
>> I think for the same reason - the ex4/MeetingDetectorTAE.xml itself
>> declares it outputs the the DateAnnot and TimeAnnot feature structures.
>>
>>
>>  
>>>> Only deleting the output type DateTimeAnnot in the capabilities of
>>>> ex3/TutorialDateTime.xml seems to have an impact on the analysis
>>>> results.
>>>>       
>>>       
>> I ran the DocAnalyzer without modifying the examples, and the
>> DateTimeAnnot does *not* appear - this is the expected behavior
>> because it is not listed in the DocumentAnalyzer's output
>> capabilities.  I think it will not appear, even if you don't delete
>> the output type DateTimeAnnot in the capabilities of
>> ex3/TutorialDateTime.xml. 
>>  
>>>> Why is it that deleting some output types have no impact on analysis
>>>> results, while deleting other output types do have an impact? Aren't
>>>> all output types supposed to have this impact?
>>>>       
>>>       
>> The UIMA framework makes the UIMA Metadata available to applications,
>> but doesn't specify what those application do with that data.  The
>> DocumentAnalyzer is just a sample application - built to show many of
>> the capabilities of UIMA.  It took a particular design choice - to
>> show annotations in the CAS that were specified as output
>> capabilities of the top-most component (in the case of aggregates). 
>> Hope that helps.
>>
>> -Marshall
>>
>>  
>>> Any help appreciated.
>>> Thanks,
>>>
>>> Bart
>>>
>>>     
>>
>>   
> Thanks for your input.
>
> I asked this question because I am trying to build a UIMA pipeline and
> the role of the AE capabilities in the intermediate annotators is not
> entirely clear to me. I was under the impression that for each
> annotator in the pipeline, the capabilities specify which are its
> input/output types. 
Yes, I think that is correct.
> However, apparently once an annotation is inside the CAS, the
> specifications in the capabilities of the AEs do not seem to be
> relevant anymore.
>
> For example, take the aggregate ex4/MeetingDetectorTAE.xml. 
> MeetingAnnotator.java uses the types RoomNumber, DateAnnot and
> TimeAnnot to detect meetings. What surprises me is that deleting the
> output type RoomNumber in ex2/RoomNumberAnnotator.xml and deleting all
> the input types in ex4/MeetingAnnotator.xml (RoomNumber, DateAnnot and
> TimeAnnot) has no effect at all on the output: meetings are still
> detected correctly although these types have been deleted from the
> capabilities. 
The use of the capabilities varies within the framework - so there is
not a simple answer.
One thing that capabilities currently are *not* used for is deleting
elements out of the CAS. - so that is why things still work in the case
you cite.

Some things capabilities are used for include setting the default Result
Specification (see
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.result_specification_setting
) .  Another one is the CapabilityLanguageFlow (search for
capabilitylanguageflow in
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/references/references.html).
> Is this just because these annotations are already readily available
> in the CAS and if so, what exactly is the role of the capabilities and
> when should their types/features be specified?
The best practice is to use these pecifications to document, for each
component, what inputs and outputs it needs / produces.  In the future,
UIMA may be enhanced to do more with these, or tooling may be developed
that does more with this metadata (for instance, configuration tooling
that insures a "flow" makes sense - that things needed are produced
before they're needed, etc.).
>
>
> Sorry if this is a basic question: I'm new to this.
No problem. 
> Thank you for your time,
You're welcome, and welcome to UIMA :-)

-Marshall
>
> Bart
>
>

Mime
View raw message