uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Riebling <e...@cs.cmu.edu>
Subject Re: Getting annotations from CASes 'external' to a pipeline
Date Fri, 16 Mar 2012 21:07:33 GMT
Last one, sorry list members for the spam.

The reason things were funky is that this was a system that
used CAS Multipliers to create the CASes that my Component
was seeing.  If I run my Component in a straight-line pipeline,
getEmptyCAS() produces CASes with the full type system as it
is supposed to do.

I don't fully understand the architecture of the surrounding
system, but once I do, will supply you guys with the details in
case this is a bug with the way UIMA handles CASes that are
multiplied more than once.

On 3/16/2012 2:16 PM, Eric Riebling wrote:
> And the difference in environment:
>
> * use SimpleRunCPE - user defined types don't show up
> * use CPE GUI - they DO show up
>
> This is interesting!
>
> On 3/15/2012 6:50 PM, Eddie Epstein wrote:
>> My last note was incorrect. Here is a paraphrase of working code:
>>
>> public AbstractCas next() throws AnalysisEngineProcessException {
>> CAS aCAS = getEmptyCAS();
>> try {
>> ByteArrayInputStream casIn = getNextXmiCas();
>> XmiCasDeserializer.deserialize(casIn, aCAS, true); //
>> deserialize in a lenient fashion
>> return aCAS;
>> } catch (SAXException e) {
>> throw new AnalysisEngineProcessException(e);
>> } catch (IOException e) {
>> throw new AnalysisEngineProcessException(e);
>> }
>> ...
>>
>>
>> On Thu, Mar 15, 2012 at 5:59 PM, Marshall Schor<msa@schor.com> wrote:
>>>
>>>
>>> On 3/15/2012 4:38 PM, Eddie Epstein wrote:
>>>>
>>>> Cannot deserialize into a CAS from getEmptyCas().
>>>
>>> This is not right. More information soon (ran out of time today). -Marshall
>>>
>>>> Must use a CAS from
>>>> CasCreationUtils.createCas for deserialization, and then use casCopier
>>>> to copy to the CAS from getEmptyCas().
>>>>
>>>> Pick the version of createCas that specifies a typesystem, and use the
>>>> typesystem from the pipeline CAS (i.e. the one from getEmptyCas).
>>>>
>>>> On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling<er1k@cs.cmu.edu> wrote:
>>>>>
>>>>> Thanks, guys. This is getting me closer to the goal, and explains the
>>>>> observed
>>>>> behaviors. Now I'm facing issues when implemented as a CAS Multiplier.
>>>>> I
>>>>> try
>>>>> creating a new CAS first with getEmptyJCas().
>>>>>
>>>>> Here are some various strategies and what resulted:
>>>>>
>>>>> * create a deserializer with the typesystem from the AE (which
>>>>> includes types in the 'external' CAS to be deserialized)
>>>>> * ues it to deserialize into the empty CAS created with getEmptyJCas()
>>>>>
>>>>> -> The deserialized CAS for some reason has only the base TOP
>>>>> typesystem
>>>>> -> Trying to access an annotation from an index (that should be there)
>>>>> generates the "used in Java code, but was not declared in the XML
>>>>> type
>>>>> descriptor"
>>>>> exception
>>>>>
>>>>> * same as above, but use CasCopier to try and copy the type system
>>>>> (and everything else) from the CAS in the AE's process() method
>>>>> into the empty CAS
>>>>>
>>>>> -> Attempted to copy a FeatureStructure of type "(my type name)",
which
>>>>> is
>>>>> not defined in the type system of the destination CAS.
>>>>>
>>>>> It seems the ONLY way to obtain a CAS (empty or otherwise) that has the
>>>>> type
>>>>> system able
>>>>> to accept the external CAS being deserialized is to use the very CAS
>>>>> passed
>>>>> into
>>>>> the AE's process() method. Doing so obviously mangles that CAS for the
>>>>> rest
>>>>> of
>>>>> the pipeline.
>>>>>
>>>>>
>>>>> On 3/15/2012 1:50 PM, Marshall Schor wrote:
>>>>>>
>>>>>>
>>>>>> On 3/15/2012 10:38 AM, Eric Riebling wrote:
>>>>>>>
>>>>>>> I have a pipeline with it's own type system.
>>>>>>> I also have deserialized, annotated CASes on disk with a different
type
>>>>>>> system.
>>>>>>> Suppose I want an Analysis Engine in the pipeline to read in
the
>>>>>>> deserialized
>>>>>>> CASes in order to obtain annotations and 'do things with them'
>>>>>>>
>>>>>>> I understand some limitations in the UIMA framework prevent this,
but
>>>>>>> could it be done by making the first type system include that
of the
>>>>>>> CASes to deserialize?
>>>>>>
>>>>>> Yes, I think so.
>>>>>>>
>>>>>>>
>>>>>>> Also, it would necessitate creating new CASes within the Analysis
>>>>>>> Engine.
>>>>>>> I could think of several approaches, and have tried some without
>>>>>>> success:
>>>>>>>
>>>>>>> * Create a new, 'temporary' View in the AE's process() method,
obtain a
>>>>>>> JCas, obtain it's CAS, and use that to store the deserialized
CASes
>>>>>>> (seems to mangle the original CAS and break downstream AEs in
the
>>>>>>> pipeline,
>>>>>>> and seems to not be able to find any annotations in the deserialized
>>>>>>> CAS)
>>>>>>>
>>>>>> This won't work. The deserialize method effectively "resets" the
CAS
>>>>>> before loading it.
>>>>>> A view is not a new CAS; it is a new view of the same CAS.
>>>>>>
>>>>>>> * Use the CAS in the process() method to store the deserialized
CASes
>>>>>>> (also mangles the original CAS, breaks downstream AEs, but DOES
>>>>>>> permit obtaining annotations from the deserialized CASes)
>>>>>>
>>>>>> Right, deserializing into an existing CAS resets it in flight.
>>>>>>>
>>>>>>>
>>>>>>> * Make the Analysis Engine be a CAS Multiplier, and deserialize
into
>>>>>>> a CAS created with createEmtpyCas()
>>>>>>> (I haven't tried this yet)
>>>>>>
>>>>>> Yes, this is the way to get a separate CAS instance to deserialize
into.
>>>>>> It's how Collection Readers do it.
>>>>>> -Marshall
>>>>>>>
>>>>>>>
>>>>>>> It's kind of a use case for a hybrid Component that behaves in
some
>>>>>>> ways
>>>>>>> like
>>>>>>> an AE (has a process() method), in some ways like XMI Collection
>>>>>>> Reader,
>>>>>>> and
>>>>>>> in some ways like a CAS Multiplier.
>>>>>>>
>>>>>>> But it's a useful use case! It is also a very bizarre one becuase
you
>>>>>>> could
>>>>>>> almost think of it as a pipeline within a pipeline, which processes
a
>>>>>>> set
>>>>>>> of deserialized annotated XMI documents, within a pipeline that
>>>>>>> processes
>>>>>>> ...
>>>>>>> in our case, a Question Answering system with question keyterms,
>>>>>>> ranked lists of documents and answer candidates.
>>>>>>>
>>>
>>
>

Mime
View raw message