uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: Basic UIMA questions
Date Thu, 14 Jan 2016 14:32:08 GMT
On 14.01.2016, at 15:09, Sean Crist <Sean.Crist@humedica.com> wrote:
> 
> Hi,
> 
> I have a few questions on the basic concepts of UIMA.  It’s fine if you tell me to
read the manuals, but I haven’t been able to find the answers there so far, so a chapter
reference would be a big help.
> 
> 1)    If Annotator A creates an annotation, is it OK for Annotator B to modify the information
in the annotations which A created?

In general, yes. If you work with delta CAS or if you plan to modify feature values which
are used as index keys (e.g. begin/end offsets), you should be careful though as it depends
on the UIMA version you are using. Cf. here:

https://uima.apache.org/d/uimaj-2.7.0/references.html#ugr.ref.config.protect-index

> 2)   I’ve read that an annotation can contain a reference to another annotation, but
I haven’t been able to find instructions or an example.
> 
> Possibly, I could generate the annotation class using JCasGen, and then manually augment
the auto-generated code to support references to other annotation objects.  Is that a good
way to do it?  Or is there some kind of built-in support?

You first define the type X you want to reference to. Then you define a type Y and feature
on type Y of type X. That's it. Cf. 

http://stackoverflow.com/questions/34685195/uima-custom-type-with-custom-feature-type-range

JCasGen will generate the appropriate getters and setters for that feature/type.

> 3)   Suppose I want a parser to build a parse tree over tokens.  A parse tree consists
of a hierarchy of nodes.
> 
> I could represent each node as an annotation.  Is that the most UIMA-like solution?

Sure. Typical representation of a parse tree is this:

Constituent extends Annotation {
  Constituent parent;
  Array of Constituent children;
}

Cf. e.g. the documentation of the DKPro Core type system: 

https://dkpro.github.io/dkpro-core/documentation/

Currently under the heading "DKPro Core 1.8.0-SNAPSHOT" - "Typesystem Reference". These types
are all defined as UIMA types and the documentation is actually auto-generated from the UIMA
XML typedescriptors in DKPro Core.

> The reason I hesitate is this.  If I were writing a non-UIMA solution from scratch, I’d
treat all of the nodes above the token level as abstract units, and those abstract units wouldn’t
deal in concrete information such as the beginning and end of a character range.  I’d keep
track of that only at the token level.  I think that all UIMA annotations are required to
keep track of this information.

You can derive your types from AnnotationBase which does not have begin/end features if you
do not wish to duplicate offset information. But it is often a good idea to repeat that on
higher-level annotations.

> Also, it sounds the only way for an annotator to retrieve existing annotations is to
create an iterator and pull them out one by one.  I wish there were a way to just get a reference
to the root node of my parse tree, so that I can simply step recursively through the tree
(which assumes I’ve arranged for each node to contain references to its children).

The typical approach is to give the root node a dedicated type, e.g. ROOT (extends Constituent)
and then iterate over all ROOT annotations.

There are a number of type systems for UIMA that already define all kinds of annotation types
for linguistc annotations:

- DKPro Core
- ClearTK
- U-Compare
- JCoRe
- ... 

I would recommend using one of them instead of inventing your own from scratch.

Cheers,

-- Richard

Disclaimer: I'm also working on DKPro Core, so sorry for all the respective references ;)


Mime
View raw message