uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: JCasGen support for CAS-transported custom Java objects
Date Wed, 21 Oct 2020 20:34:42 GMT
Hi Mario,

> On 21. Oct 2020, at 21:26, Mario Juric <mario.juric@cactusglobal.com> wrote:
> 
> We never had problems migrating from one type system as long the types where either extended
or something was deleted. The problem we had was when an attribute changed type, e.g. a change
from a simple FSArray to a wrapper type with the custom java object and a FSArray. We tried
something similar last year where a type A had an FSArray attribute with elements of another
type B that previously inherited from Annotation, and we changed that to inherit from TOP
instead, while all of the attributes of B, that we had declared, remained unchanged. Not surprisingly
the deserialiser couldn’t load the old CAS leniently with this change, and we never figured
out how to do a conversion, if that is at all possible, since A can only take one form, i.e.
we haven’t figured out how to have two versions of A simultaneously in order to make a conversion.
Maybe there are some lower level CAS possibilities that we are not aware of yet. The problem
should be the same when changing the type of an attribute from FSArray to a wrapper type with
custom java objects.

Ok, I think I get the picture now. I was imagining to create a new type that would replace
the old and basically copying the data over into the new structure. You are thinking of basically
modifying a type "in-place".

I think this is doable in the following way:

1) create a CAS "oldCas" with your existing type system

CAS oldCas = CasFactory.createCas(
  TypeSystemDescriptionFactory.createTypeSystemDescriptionFromPath("old_typesystem.xml");

2) create a CAS "newCas" with your new type system

CAS newCas = CasFactory.createCas(
  TypeSystemDescriptionFactory.createTypeSystemDescriptionFromPath("new_typesystem.xml");

3) implement a method taking two CASes and coping the data from one to the other while
   massaging relevant feature structures according to the changes in the type system

void copyAndUpgradeCas(CAS oldCas, CAS newCas) {
  // Recursively collect all accessible feature structures in oldCas
  // for each feature structure, create a copy in newCas
  // If the feature structure is of a type which changed, copy data according to the changes
  // otherwise, copy it 1-to-1 (or at least the primitive values)
  // collect a reference which old FS was mapped to which new FS which can be used to connect
  //   FS references in a second pass
  // in a second pass copy/convert the FS references (i.e. non-primitive features)
  // Optionally repeat the process for other views in the CAS
}

(Basically step 3 is in a sense CasCopier - just a custom one where you apply a data transformation
instead of just copying the data.)

Important for this to work is that you are using the CAS API and stay away from the JCas API!

If you had XMI data instead of binary CASes, I would have suggested that DKPro Cassis might
be a route to explore. With this library, you can load XMI CAS objects into Python and Python
objects are notoriously flexible and malleable - much more so than CAS / JCas objects. I didn't
dig into it, but I could imagine that a CAS and type system loaded using DKPro Cassis could
be monkey-patched in-place into a new structure. But then again, I haven't tried using Cassis
for this purpose but I am quite confident that
the Java-based approach I outlined above should be doable.

Cheerio,

-- Richard


Mime
View raw message