uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: I don't understand the benefits of CAS
Date Thu, 19 Jul 2012 07:52:48 GMT
Another way of saying this is that if the framework owns the data,
it can provide services for that data (such as serialization
and network transport) seamlessly and transparently.  If you
pass any kind of objects between components, this is generally
not possible.


On 19/07/12 00:00, Marshall Schor wrote:
> On 7/18/2012 5:07 PM, Sebastian Sprenger wrote:
>> I am writing on a state of the art analysis of frameworks for filtering and 
>> analysing information streams. 
>> I don't understand why annotators (or any pre-processing components) can only 
>> process Objects that specify JCas. 
> This is probably off-topic, and may be a detail, but the fundamental object
> container in UIMA that flows from annotator to annotator is a "CAS".  There is
> an interface for it called the JCas - which stands for Java interface to the CAS.
> So I presume in the above sentence you meant a "CAS".
>> Why is it not possible to process arbitrary 
>> objects?
> A purpose of UIMA is to enable collaboration among independently developed
> unstructured information analysis components.  In general, these components can
> be written in a variety of "languages".  UIMA, in particular, supports
> annotators written in Java and C/C++, plus some others (Python, etc).  These
> languages have different capabilities for expressing "objects".  For UIMA we
> chose an approach which put the objects (featurestructures) into the CAS.
> When writing a particular annotator, in a particular language, you are free to
> use whatever objects you desire within that annotator.  When you get or put data
> into the CAS you are "sharing" that data with other components, potentially
> developed independently, by others, in other languages.
> ==========
> You may be asking a different question, however.  You may be saying, I have a
> JPEG image, or an Audio file encoded in mp3, etc.  -- something that's not
> "text".  The often used approach for examples in UIMA sometimes appears to
> assume that the unstructured input, the "Subject of Analysis" (or SofA as our
> documentation calls it) is a text document.  However, UIMA *does* allow
> arbitrary kinds of unstructured information for the SofA -- if that was your
> concern.  For more details, see
> http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.aas.sofa_data_formats
>> Best regards, Sebastian   
> Hope this helps.  -Marshall

View raw message