uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neal R Lewis <nrle...@us.ibm.com>
Subject Maintaining UIMAj indexing and references while using stable FSIDs in a CAS Store
Date Fri, 08 Feb 2013 23:11:52 GMT
It was brought up recently in a meeting that we have to consider the effect of a Feature Structure
ID in a CAS / CAS Store on deserialization of a CAS into UIMAj and the annotation indexing.

e.g, How would adding a stable identifier affect indexing and references withing jCAS Objects?

I'd like to throw out a couple scenarios to the community and see if these cover all of the
possible use cases, and discuss how I currently implement it, and hopefully get some comments

First, I'd like to confirm that I'm thinking of a CAS STore operating in between different
PEARs or full UIMA Applications, not running between an Aggregate analytic (although that
is definitely something to consider).  Furthermore, I am assuming that the CAS Store interface
retrieves a CAS object that agrees to the OASIS spec, and that the CAS store is responsible
for creating FSIDs.

I can think of four scenarios when deserializing a CAS xmi (I'm not sure about deserializing
from binary) to a  jCAS object, as it comes from the CAS Store.
1:  A minimal CAS that contains only a sofa and view . This is the simplest input to pull
from a CAS Store, and doesn't require an modifications in the UIMAj deserialization.

2:  A full CAS with a SOFA and associated annotations in multiple views

3:  A CAS Fragment (or projection) of a single CAS xmi from the store, that contains only
the information necessary for this particular Analytic Pipeline (there might or might not
be a SOFA and view associated with it).

4:  A CAS created from one or more analytics on different artifacts (zero or more cas:Sofa
elements, and zero or more View elements)

Currently, if I use the FSID element, I have to set the deserialization to LENIENT, or preprocess
them out of the CAS before deserialization. This simply removes the unknown attributes. 

For scenario 1, other than lenient serialization, nothing needs to be completed.

For scenario 2 and 3, the associated Type System of the CAS must be registered for serialization.

For scenerio 4, I haven't implemented yet in UIMAj, but will be working on something for this

Now, I haven't dug into the Serialization code yet to see how else this can be accomplished,
but will be looking into it soon.  I would just like to begin a discussion on this topic to
make sure that we're covering all our bases :) 



View raw message