uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eddie Epstein <eaepst...@gmail.com>
Subject Re: Getting annotations from CASes 'external' to a pipeline
Date Thu, 15 Mar 2012 20:38:44 GMT
Cannot deserialize into a CAS from getEmptyCas(). Must use a CAS from
CasCreationUtils.createCas for deserialization, and then use casCopier
to copy to the CAS from getEmptyCas().

Pick the version of createCas that specifies a typesystem, and use the
typesystem from the pipeline CAS (i.e. the one from getEmptyCas).

On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling <er1k@cs.cmu.edu> wrote:
> Thanks, guys.  This is getting me closer to the goal, and explains the
> observed
> behaviors.  Now I'm facing issues when implemented as a CAS Multiplier.  I
> try
> creating a new CAS first with getEmptyJCas().
> Here are some various strategies and what resulted:
>  * create a deserializer with the typesystem from the AE (which
>        includes types in the 'external' CAS to be deserialized)
>  * ues it to deserialize into the empty CAS created with getEmptyJCas()
>  -> The deserialized CAS for some reason has only the base TOP typesystem
>  -> Trying to access an annotation from an index (that should be there)
>    generates the "used in Java code,  but was not declared in the XML type
> descriptor"
>        exception
>  * same as above, but use CasCopier to try and copy the type system
>        (and everything else) from the CAS in the AE's process() method
>          into the empty CAS
>  -> Attempted to copy a FeatureStructure of type "(my type name)", which is
> not defined in the type system of the destination CAS.
> It seems the ONLY way to obtain a CAS (empty or otherwise) that has the type
> system able
> to accept the external CAS being deserialized is to use the very CAS passed
> into
> the AE's process() method.  Doing so obviously mangles that CAS for the rest
> of
> the pipeline.
> On 3/15/2012 1:50 PM, Marshall Schor wrote:
>> On 3/15/2012 10:38 AM, Eric Riebling wrote:
>>> I have a pipeline with it's own type system.
>>> I also have deserialized, annotated CASes on disk with a different type
>>> system.
>>> Suppose I want an Analysis Engine in the pipeline to read in the
>>> deserialized
>>> CASes in order to obtain annotations and 'do things with them'
>>> I understand some limitations in the UIMA framework prevent this, but
>>> could it be done by making the first type system include that of the
>>> CASes to deserialize?
>> Yes, I think so.
>>> Also, it would necessitate creating new CASes within the Analysis Engine.
>>> I could think of several approaches, and have tried some without success:
>>> * Create a new, 'temporary' View in the AE's process() method, obtain a
>>> JCas, obtain it's CAS, and use that to store the deserialized CASes
>>> (seems to mangle the original CAS and break downstream AEs in the
>>> pipeline,
>>> and seems to not be able to find any annotations in the deserialized CAS)
>> This won't work. The deserialize method effectively "resets" the CAS
>> before loading it.
>> A view is not a new CAS; it is a new view of the same CAS.
>>> * Use the CAS in the process() method to store the deserialized CASes
>>> (also mangles the original CAS, breaks downstream AEs, but DOES
>>> permit obtaining annotations from the deserialized CASes)
>> Right, deserializing into an existing CAS resets it in flight.
>>> * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>>> a CAS created with createEmtpyCas()
>>> (I haven't tried this yet)
>> Yes, this is the way to get a separate CAS instance to deserialize into.
>> It's how Collection Readers do it.
>> -Marshall
>>> It's kind of a use case for a hybrid Component that behaves in some ways
>>> like
>>> an AE (has a process() method), in some ways like XMI Collection Reader,
>>> and
>>> in some ways like a CAS Multiplier.
>>> But it's a useful use case! It is also a very bizarre one becuase you
>>> could
>>> almost think of it as a pipeline within a pipeline, which processes a set
>>> of deserialized annotated XMI documents, within a pipeline that processes
>>> ...
>>> in our case, a Question Answering system with question keyterms,
>>> ranked lists of documents and answer candidates.

View raw message