uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mario Juric <mario.ju...@cactusglobal.com>
Subject Re: JCasGen support for CAS-transported custom Java objects
Date Thu, 22 Oct 2020 08:11:48 GMT
Thanks Richard.

This could probably work. We haven’t tried converting via XML yet, because we had problems
with some documents containing characters outside the allowed XML 1.0 range, but it should
be possible now that XML 1.1 is supported with the XMI serialiser. The corpus has to be sufficiently
large of course before this extra work and processing overhead pays off, but we have some
of these.


> On 21 Oct 2020, at 22.34, Richard Eckart de Castilho <rec@apache.org> wrote:
> External email – Do not click links or open attachments unless you recognize the sender
and know that the content is safe.
> Hi Mario,
>> On 21. Oct 2020, at 21:26, Mario Juric <mario.juric@cactusglobal.com> wrote:
>> We never had problems migrating from one type system as long the types where either
extended or something was deleted. The problem we had was when an attribute changed type,
e.g. a change from a simple FSArray to a wrapper type with the custom java object and a FSArray.
We tried something similar last year where a type A had an FSArray attribute with elements
of another type B that previously inherited from Annotation, and we changed that to inherit
from TOP instead, while all of the attributes of B, that we had declared, remained unchanged.
Not surprisingly the deserialiser couldn’t load the old CAS leniently with this change,
and we never figured out how to do a conversion, if that is at all possible, since A can only
take one form, i.e. we haven’t figured out how to have two versions of A simultaneously
in order to make a conversion. Maybe there are some lower level CAS possibilities that we
are not aware of yet. The problem should be the same when changing the type of an attribute
from FSArray to a wrapper type with custom java objects.
> Ok, I think I get the picture now. I was imagining to create a new type that would replace
the old and basically copying the data over into the new structure. You are thinking of basically
modifying a type "in-place".
> I think this is doable in the following way:
> 1) create a CAS "oldCas" with your existing type system
> CAS oldCas = CasFactory.createCas(
>  TypeSystemDescriptionFactory.createTypeSystemDescriptionFromPath("old_typesystem.xml");
> 2) create a CAS "newCas" with your new type system
> CAS newCas = CasFactory.createCas(
>  TypeSystemDescriptionFactory.createTypeSystemDescriptionFromPath("new_typesystem.xml");
> 3) implement a method taking two CASes and coping the data from one to the other while
>   massaging relevant feature structures according to the changes in the type system
> void copyAndUpgradeCas(CAS oldCas, CAS newCas) {
>  // Recursively collect all accessible feature structures in oldCas
>  // for each feature structure, create a copy in newCas
>  // If the feature structure is of a type which changed, copy data according to the changes
>  // otherwise, copy it 1-to-1 (or at least the primitive values)
>  // collect a reference which old FS was mapped to which new FS which can be used to
>  //   FS references in a second pass
>  // in a second pass copy/convert the FS references (i.e. non-primitive features)
>  // Optionally repeat the process for other views in the CAS
> }
> (Basically step 3 is in a sense CasCopier - just a custom one where you apply a data
> instead of just copying the data.)
> Important for this to work is that you are using the CAS API and stay away from the JCas
> If you had XMI data instead of binary CASes, I would have suggested that DKPro Cassis
might be a route to explore. With this library, you can load XMI CAS objects into Python and
Python objects are notoriously flexible and malleable - much more so than CAS / JCas objects.
I didn't dig into it, but I could imagine that a CAS and type system loaded using DKPro Cassis
could be monkey-patched in-place into a new structure. But then again, I haven't tried using
Cassis for this purpose but I am quite confident that
> the Java-based approach I outlined above should be doable.
> Cheerio,
> -- Richard

This email and any files transmitted with it are confidential and directed solely for the
use of the intended addressee or addressees and may contain information that is legally privileged,
confidential, and exempt from disclosure. If you have received this email in error, please
notify the sender by telephone, fax, or return email and immediately delete this email and
any files transmitted along with it. Unintended recipients are not authorized to disclose,
disseminate, distribute, copy or take any action in reliance on information contained in this
email and/or any files attached thereto, in any manner other than to notify the sender; any
unauthorized use is subject to legal prosecution.

View raw message