uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Riebling <e...@cs.cmu.edu>
Subject Re: Getting annotations from CASes 'external' to a pipeline
Date Fri, 16 Mar 2012 18:16:11 GMT
And the difference in environment:

  * use SimpleRunCPE - user defined types don't show up
  * use CPE GUI      - they DO show up

This is interesting!

On 3/15/2012 6:50 PM, Eddie Epstein wrote:
> My last note was incorrect. Here is a paraphrase of working code:
>
>    public AbstractCas next() throws AnalysisEngineProcessException {
>      CAS aCAS = getEmptyCAS();
>      try {
>        ByteArrayInputStream casIn = getNextXmiCas();
>        XmiCasDeserializer.deserialize(casIn, aCAS, true); //
> deserialize in a lenient fashion
>        return aCAS;
>      } catch (SAXException e) {
>        throw new AnalysisEngineProcessException(e);
>      } catch (IOException e) {
>        throw new AnalysisEngineProcessException(e);
>      }
> ...
>
>
> On Thu, Mar 15, 2012 at 5:59 PM, Marshall Schor<msa@schor.com>  wrote:
>>
>>
>> On 3/15/2012 4:38 PM, Eddie Epstein wrote:
>>>
>>> Cannot deserialize into a CAS from getEmptyCas().
>>
>> This is not right.  More information soon (ran out of time today). -Marshall
>>
>>> Must use a CAS from
>>> CasCreationUtils.createCas for deserialization, and then use casCopier
>>> to copy to the CAS from getEmptyCas().
>>>
>>> Pick the version of createCas that specifies a typesystem, and use the
>>> typesystem from the pipeline CAS (i.e. the one from getEmptyCas).
>>>
>>> On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling<er1k@cs.cmu.edu>    wrote:
>>>>
>>>> Thanks, guys.  This is getting me closer to the goal, and explains the
>>>> observed
>>>> behaviors.  Now I'm facing issues when implemented as a CAS Multiplier.
>>>>   I
>>>> try
>>>> creating a new CAS first with getEmptyJCas().
>>>>
>>>> Here are some various strategies and what resulted:
>>>>
>>>>   * create a deserializer with the typesystem from the AE (which
>>>>         includes types in the 'external' CAS to be deserialized)
>>>>   * ues it to deserialize into the empty CAS created with getEmptyJCas()
>>>>
>>>>   ->    The deserialized CAS for some reason has only the base TOP
>>>> typesystem
>>>>   ->    Trying to access an annotation from an index (that should be there)
>>>>     generates the "used in Java code,  but was not declared in the XML
>>>> type
>>>> descriptor"
>>>>         exception
>>>>
>>>>   * same as above, but use CasCopier to try and copy the type system
>>>>         (and everything else) from the CAS in the AE's process() method
>>>>           into the empty CAS
>>>>
>>>>   ->    Attempted to copy a FeatureStructure of type "(my type name)",
which
>>>> is
>>>> not defined in the type system of the destination CAS.
>>>>
>>>> It seems the ONLY way to obtain a CAS (empty or otherwise) that has the
>>>> type
>>>> system able
>>>> to accept the external CAS being deserialized is to use the very CAS
>>>> passed
>>>> into
>>>> the AE's process() method.  Doing so obviously mangles that CAS for the
>>>> rest
>>>> of
>>>> the pipeline.
>>>>
>>>>
>>>> On 3/15/2012 1:50 PM, Marshall Schor wrote:
>>>>>
>>>>>
>>>>> On 3/15/2012 10:38 AM, Eric Riebling wrote:
>>>>>>
>>>>>> I have a pipeline with it's own type system.
>>>>>> I also have deserialized, annotated CASes on disk with a different
type
>>>>>> system.
>>>>>> Suppose I want an Analysis Engine in the pipeline to read in the
>>>>>> deserialized
>>>>>> CASes in order to obtain annotations and 'do things with them'
>>>>>>
>>>>>> I understand some limitations in the UIMA framework prevent this,
but
>>>>>> could it be done by making the first type system include that of
the
>>>>>> CASes to deserialize?
>>>>>
>>>>> Yes, I think so.
>>>>>>
>>>>>>
>>>>>> Also, it would necessitate creating new CASes within the Analysis
>>>>>> Engine.
>>>>>> I could think of several approaches, and have tried some without
>>>>>> success:
>>>>>>
>>>>>> * Create a new, 'temporary' View in the AE's process() method, obtain
a
>>>>>> JCas, obtain it's CAS, and use that to store the deserialized CASes
>>>>>> (seems to mangle the original CAS and break downstream AEs in the
>>>>>> pipeline,
>>>>>> and seems to not be able to find any annotations in the deserialized
>>>>>> CAS)
>>>>>>
>>>>> This won't work. The deserialize method effectively "resets" the CAS
>>>>> before loading it.
>>>>> A view is not a new CAS; it is a new view of the same CAS.
>>>>>
>>>>>> * Use the CAS in the process() method to store the deserialized CASes
>>>>>> (also mangles the original CAS, breaks downstream AEs, but DOES
>>>>>> permit obtaining annotations from the deserialized CASes)
>>>>>
>>>>> Right, deserializing into an existing CAS resets it in flight.
>>>>>>
>>>>>>
>>>>>> * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>>>>>> a CAS created with createEmtpyCas()
>>>>>> (I haven't tried this yet)
>>>>>
>>>>> Yes, this is the way to get a separate CAS instance to deserialize into.
>>>>> It's how Collection Readers do it.
>>>>> -Marshall
>>>>>>
>>>>>>
>>>>>> It's kind of a use case for a hybrid Component that behaves in some
>>>>>> ways
>>>>>> like
>>>>>> an AE (has a process() method), in some ways like XMI Collection
>>>>>> Reader,
>>>>>> and
>>>>>> in some ways like a CAS Multiplier.
>>>>>>
>>>>>> But it's a useful use case! It is also a very bizarre one becuase
you
>>>>>> could
>>>>>> almost think of it as a pipeline within a pipeline, which processes
a
>>>>>> set
>>>>>> of deserialized annotated XMI documents, within a pipeline that
>>>>>> processes
>>>>>> ...
>>>>>> in our case, a Question Answering system with question keyterms,
>>>>>> ranked lists of documents and answer candidates.
>>>>>>
>>
>

Mime
View raw message