uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: Getting annotations from CASes 'external' to a pipeline
Date Thu, 15 Mar 2012 17:50:13 GMT

On 3/15/2012 10:38 AM, Eric Riebling wrote:
> I have a pipeline with it's own type system.
> I also have deserialized, annotated CASes on disk with a different type system.
> Suppose I want an Analysis Engine in the pipeline to read in the deserialized
> CASes in order to obtain annotations and 'do things with them'
> I understand some limitations in the UIMA framework prevent this, but
> could it be done by making the first type system include that of the
> CASes to deserialize?
Yes, I think so.
> Also, it would necessitate creating new CASes within the Analysis Engine.
> I could think of several approaches, and have tried some without success:
>  * Create a new, 'temporary' View in the AE's process() method, obtain a
>     JCas, obtain it's CAS, and use that to store the deserialized CASes
>    (seems to mangle the original CAS and break downstream AEs in the pipeline,
>     and seems to not be able to find any annotations in the deserialized CAS)
This won't work.  The deserialize method effectively "resets" the CAS before 
loading it.
A view is not a new CAS; it is a new view of the same CAS.

>  * Use the CAS in the process() method to store the deserialized CASes
>     (also mangles the original CAS, breaks downstream AEs, but DOES
>     permit obtaining annotations from the deserialized CASes)
Right, deserializing into an existing CAS resets it in flight.
>  * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>     a CAS created with createEmtpyCas()
>     (I haven't tried this yet)
Yes, this is the way to get a separate CAS instance to deserialize into.  It's 
how Collection Readers do it.
> It's kind of a use case for a hybrid Component that behaves in some ways like
> an AE (has a process() method), in some ways like XMI Collection Reader, and
> in some ways like a CAS Multiplier.
> But it's a useful use case!  It is also a very bizarre one becuase you could
> almost think of it as a pipeline within a pipeline, which processes a set
> of deserialized annotated XMI documents, within a pipeline that processes ...
> in our case, a Question Answering system with question keyterms,
> ranked lists of documents and answer candidates.

View raw message