uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: Using OpenNLP type annotations with UIMAfit
Date Sun, 25 Jan 2015 22:15:41 GMT
Hi there,

> Hi,
> 
> The UIMAfit manual (5.1) states that the preferred way to iterate over tokens in
> the CAS is the following:
> 
>    // JCas version
>    for (Token token : JCasUtil.select(jcas, Token.class)) {
>      ...
>    }
> 
> This assumes a Token.class is importable somewhere. But I'm using the OpenNLP
> tools, which don't provide such a type. Instead, it seems to be generated at run
> time during configuration steps, and is not accessible as a class in the AE (to
> my knowledge.)

No, it is not generated at runtime. It is generated manually or at build-time, e.g. using
the maven-jcasgen-plugin. 

OpenNLP aims to be configurable with regards to types. So you must have *some* type system
that you configure OpenNLP to use, right? Open it in the Eclipse UIMA Type-System Editor and
hit the "JCasGen" button - it will generate the JCas classes that you can use with uimaFIT
JCasUtil.

> Additionally, when extending o.a.u.fit.component.JCasAnnotator_ImplBase instead
> of o.a.u.component.JCasAnnotator_ImplBase, the method void typeSystemInit(TypeSytem)
> is not provided, which makes instatiating the type system the same way OpenNLP
> does it rather cumbersome (I generate an empty CAS with the typSystemDescription,
> then get its TypSystem and provide the Type and Feature objects from this
> TypeSystem instance as UIMAfit configuration parameters before deploying my AE.)

typeSystemInit() is meant for CAS-based analysis engines, not for JCas-based annotators. 
You need the CAS-based API only if you want to configure your components at runtime with regards
to the annotation types they should use. If you can stick to a specific type system, use the
JCas-based analysis engines.

> Even then, I can only use the less type-safe method of iterating over
> annotations: for (AnnotationFS token : cas.getAnnotationIndex(tokenType)) where
> tokenType is the Type instance I acquired from the TypeSystem either during
> typeSystemInit() or during configuration with the above hack.

The CAS-API is not type-safe. Neither is the UIMA-JCas API, but the uimaFIT JCas-API is ;)

> Is there some good way of solving this dilemma while still using UIMAfit's
> classes? Obviously, I could go back to using just plain UIMA, but I quite like
> UIMAfit's way of dealing with external resources! And I don't like the
> type-system-through-cas hack.

Generate the JCas classes for your type system and you should be fine.

You could alternatively use an alternative OpenNLP binding for UIMA, e.g. the one provided
by DKPro Core [1] (not an Apache project, but one I'm working on too).

Cheers,

-- Richard

[1] https://code.google.com/p/dkpro-core-asl/
Mime
View raw message