uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eddie Epstein <eaepst...@gmail.com>
Subject Re: Getting annotations from CASes 'external' to a pipeline
Date Thu, 15 Mar 2012 22:50:09 GMT
My last note was incorrect. Here is a paraphrase of working code:

  public AbstractCas next() throws AnalysisEngineProcessException {
    CAS aCAS = getEmptyCAS();
    try {
      ByteArrayInputStream casIn = getNextXmiCas();
      XmiCasDeserializer.deserialize(casIn, aCAS, true); //
deserialize in a lenient fashion
      return aCAS;
    } catch (SAXException e) {
      throw new AnalysisEngineProcessException(e);
    } catch (IOException e) {
      throw new AnalysisEngineProcessException(e);
    }
...


On Thu, Mar 15, 2012 at 5:59 PM, Marshall Schor <msa@schor.com> wrote:
>
>
> On 3/15/2012 4:38 PM, Eddie Epstein wrote:
>>
>> Cannot deserialize into a CAS from getEmptyCas().
>
> This is not right.  More information soon (ran out of time today). -Marshall
>
>> Must use a CAS from
>> CasCreationUtils.createCas for deserialization, and then use casCopier
>> to copy to the CAS from getEmptyCas().
>>
>> Pick the version of createCas that specifies a typesystem, and use the
>> typesystem from the pipeline CAS (i.e. the one from getEmptyCas).
>>
>> On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling<er1k@cs.cmu.edu>  wrote:
>>>
>>> Thanks, guys.  This is getting me closer to the goal, and explains the
>>> observed
>>> behaviors.  Now I'm facing issues when implemented as a CAS Multiplier.
>>>  I
>>> try
>>> creating a new CAS first with getEmptyJCas().
>>>
>>> Here are some various strategies and what resulted:
>>>
>>>  * create a deserializer with the typesystem from the AE (which
>>>        includes types in the 'external' CAS to be deserialized)
>>>  * ues it to deserialize into the empty CAS created with getEmptyJCas()
>>>
>>>  ->  The deserialized CAS for some reason has only the base TOP
>>> typesystem
>>>  ->  Trying to access an annotation from an index (that should be there)
>>>    generates the "used in Java code,  but was not declared in the XML
>>> type
>>> descriptor"
>>>        exception
>>>
>>>  * same as above, but use CasCopier to try and copy the type system
>>>        (and everything else) from the CAS in the AE's process() method
>>>          into the empty CAS
>>>
>>>  ->  Attempted to copy a FeatureStructure of type "(my type name)", which
>>> is
>>> not defined in the type system of the destination CAS.
>>>
>>> It seems the ONLY way to obtain a CAS (empty or otherwise) that has the
>>> type
>>> system able
>>> to accept the external CAS being deserialized is to use the very CAS
>>> passed
>>> into
>>> the AE's process() method.  Doing so obviously mangles that CAS for the
>>> rest
>>> of
>>> the pipeline.
>>>
>>>
>>> On 3/15/2012 1:50 PM, Marshall Schor wrote:
>>>>
>>>>
>>>> On 3/15/2012 10:38 AM, Eric Riebling wrote:
>>>>>
>>>>> I have a pipeline with it's own type system.
>>>>> I also have deserialized, annotated CASes on disk with a different type
>>>>> system.
>>>>> Suppose I want an Analysis Engine in the pipeline to read in the
>>>>> deserialized
>>>>> CASes in order to obtain annotations and 'do things with them'
>>>>>
>>>>> I understand some limitations in the UIMA framework prevent this, but
>>>>> could it be done by making the first type system include that of the
>>>>> CASes to deserialize?
>>>>
>>>> Yes, I think so.
>>>>>
>>>>>
>>>>> Also, it would necessitate creating new CASes within the Analysis
>>>>> Engine.
>>>>> I could think of several approaches, and have tried some without
>>>>> success:
>>>>>
>>>>> * Create a new, 'temporary' View in the AE's process() method, obtain
a
>>>>> JCas, obtain it's CAS, and use that to store the deserialized CASes
>>>>> (seems to mangle the original CAS and break downstream AEs in the
>>>>> pipeline,
>>>>> and seems to not be able to find any annotations in the deserialized
>>>>> CAS)
>>>>>
>>>> This won't work. The deserialize method effectively "resets" the CAS
>>>> before loading it.
>>>> A view is not a new CAS; it is a new view of the same CAS.
>>>>
>>>>> * Use the CAS in the process() method to store the deserialized CASes
>>>>> (also mangles the original CAS, breaks downstream AEs, but DOES
>>>>> permit obtaining annotations from the deserialized CASes)
>>>>
>>>> Right, deserializing into an existing CAS resets it in flight.
>>>>>
>>>>>
>>>>> * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>>>>> a CAS created with createEmtpyCas()
>>>>> (I haven't tried this yet)
>>>>
>>>> Yes, this is the way to get a separate CAS instance to deserialize into.
>>>> It's how Collection Readers do it.
>>>> -Marshall
>>>>>
>>>>>
>>>>> It's kind of a use case for a hybrid Component that behaves in some
>>>>> ways
>>>>> like
>>>>> an AE (has a process() method), in some ways like XMI Collection
>>>>> Reader,
>>>>> and
>>>>> in some ways like a CAS Multiplier.
>>>>>
>>>>> But it's a useful use case! It is also a very bizarre one becuase you
>>>>> could
>>>>> almost think of it as a pipeline within a pipeline, which processes a
>>>>> set
>>>>> of deserialized annotated XMI documents, within a pipeline that
>>>>> processes
>>>>> ...
>>>>> in our case, a Question Answering system with question keyterms,
>>>>> ranked lists of documents and answer candidates.
>>>>>
>

Mime
View raw message