uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: Is it possible to define dynamically typed annotations?
Date Sun, 16 Dec 2018 11:05:10 GMT
On 16. Dec 2018, at 01:56, Alain D├ęsilets <alaindesilets0@gmail.com> wrote:
> I am not sure I understand what you wrote. Although I have been using UIMA
> for 2 years now, I am still baffled by it most of the time ;-).
> It SOUNDS like you are saying that it's possible to add new types in the
> XML typesystem file, and tell a RUNNING application to reload the XML file
> without having to recompile that application. Is that correct?


There is never a need to recompile as long as you simply stick to the CAS API.
It is possible to write a piece of Java code that sets up a type system, a
CAS, adds annotations, etc. without ever having to run JCasGen. E.g.

// Define type system
TypeSystemDescription tsd = new TypeSystemDescription_impl();
TypeDescription td = tsd.addType("TestType", "", CAS.TYPE_NAME_ANNOTATION);
td.addFeature("value", "", CAS.TYPE_NAME_INTEGER);

// Create CAS and initialize it with the prev. created TS
CAS cas = CasCreationUtils.createCas(tsd, null, null);

// Add annotation to the CAS
Type type = cas.getTypeSystem().getType("TestType");
Feature feat = type.getFeatureByBaseName("value");
AnnotationFS fs = cas.createAnnotation(type, 0, 10);
fs.setIntValue(feat, i);

The context I am using the CAS reinitialization approach that I described
in my previous mail is annotation editors (i.e. WebAnno [1] and INCEpTION [2]).

Annotation editors are hardly useful if they only support hard-coded types,
i.e. if you need to recompile them to support custom types.

Both editors allow the user to configure their type system through a web-based UI. 
Internally, they represent the annotations in UIMA CAS objects which are persisted
in different ways.

When a user opens a document (i.e. loads a UIMA CAS object), the editor needs to make
sure that the CAS is compatible with whatever type system the user has defined. E.g.
the user might have added new types or features since the document was last opened, or
might even have removed some. Note that the editors do not permit to change the type
of features (but the user could remove and re-add them).

The way to ensure that the CAS object is compatible with the current type system is:

* load the CAS object from its persistence format
* serialize it in-memory into the "compressed binary" format
* (re-)initialize the CAS with the current type system
* deserialize the CAS back into the (re-)initialized CAS

The last step is lenient and discards any types/features no longer present
in the current type system.

The DKPro Core XMI Reader [3] is btw. using the same approach in order to be able
to initialize the CAS from a type system file *while* a pipeline is being
executed. Normally the type system would need to be fixed *before* a pipeline
is executed.

It works for me, but it has its limits. E.g. such an approach is
not viable in an UIMA-AS setup (Jerry may correct my if I am wrong).

There have been thoughts running around from time to time of relaxing the
"committing" of the type system in the CAS. I believe that theoretically, it
may be possible to permit certain modifications to the type system even
after it has been "committed", i.e. within certain constraints, adding
new types and adding new features may be possible - but Marshall
can probably say more about this. Constraints would probably again be
that such a feature could not be used in distributed (at least not
without quite a bit of refactoring of the scale-out tools).


-- Richard

[1] https://github.com/webanno/webanno
[2] https://inception-project.github.io
[3] https://github.com/dkpro/dkpro-core/blob/master/dkpro-core-io-xmi-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/xmi/XmiReader.java#L117
View raw message