uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: I don't understand the benefits of CAS
Date Wed, 18 Jul 2012 22:00:16 GMT

On 7/18/2012 5:07 PM, Sebastian Sprenger wrote:
> I am writing on a state of the art analysis of frameworks for filtering and 
> analysing information streams. 
> I don't understand why annotators (or any pre-processing components) can only 
> process Objects that specify JCas. 

This is probably off-topic, and may be a detail, but the fundamental object
container in UIMA that flows from annotator to annotator is a "CAS".  There is
an interface for it called the JCas - which stands for Java interface to the CAS.

So I presume in the above sentence you meant a "CAS".
> Why is it not possible to process arbitrary 
> objects?

A purpose of UIMA is to enable collaboration among independently developed
unstructured information analysis components.  In general, these components can
be written in a variety of "languages".  UIMA, in particular, supports
annotators written in Java and C/C++, plus some others (Python, etc).  These
languages have different capabilities for expressing "objects".  For UIMA we
chose an approach which put the objects (featurestructures) into the CAS.

When writing a particular annotator, in a particular language, you are free to
use whatever objects you desire within that annotator.  When you get or put data
into the CAS you are "sharing" that data with other components, potentially
developed independently, by others, in other languages.

==========

You may be asking a different question, however.  You may be saying, I have a
JPEG image, or an Audio file encoded in mp3, etc.  -- something that's not
"text".  The often used approach for examples in UIMA sometimes appears to
assume that the unstructured input, the "Subject of Analysis" (or SofA as our
documentation calls it) is a text document.  However, UIMA *does* allow
arbitrary kinds of unstructured information for the SofA -- if that was your
concern.  For more details, see
http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.aas.sofa_data_formats

> Best regards, Sebastian   
>
>
Hope this helps.  -Marshall


Mime
View raw message