uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rogan Creswick <cresw...@gmail.com>
Subject Re: Problem with multiple type identifiers when loading pears
Date Sat, 05 Sep 2009 19:23:11 GMT
On Sat, Sep 5, 2009 at 5:09 AM, Marshall Schor <msa@schor.com> wrote:
>
> Can you say a bit more what the problem is?
>

I think my problem is actually tangential to the issue with
JCasRegistry.  After retrieving types from the jcas TypeSystem, I
still ran into issues with multiple Redaction definitions because of
multiple copies of the bytecode being loaded (multiple class loaders,
I think).  I've worked around that -- see below if you're interested.
I'd like to hear suggestions to make it cleaner, but at least it's
working.

I still don't understand why  JCasRegistry.register(...) shouldn't be
a true function.  It seems like there are at least two parallel ways
to retrieve types, and in my experience, they don't return the same
results--at least when getting filtered annotation indices.  (The ways
being: JCasRegistry.getClassForIndex(MyAnnotationType.type) and
aJCas.getTypeSystem().getTypeByName(MyAnnotationType.class.getType())

Anyhow, here's an overview of what we're doing -- it may shed some
light on this issue:


The UIMA portion of our application is a self-contained module (lets
call it 'core') that (once instantiated) takes a Document as input,
and returns a Collection<Violation>.  Violations are moderately
complex data structures that contain the fields of an Annotation
object -- specifically, a Redaction (Redaction is a JCasGen-generated
annotation subtype with some minor additional metadata that the
Annotators populate).

When core is instantiated, it gets a list of UIMA annotators to use to
generate Redaction objects, which are, in turn, translated by core
into Violations.  So, from core's perspective, each UIMA annotator is
just a module that generates a jcas with Redaction annotations.

The UIMA annotators need know nothing about core to function, although
they do have a dependency on Core at the moment, so that they can all
share the same implementattion of Redaction and Redaction_Type.

My intent was to use PEARs as the distribution mechanism for UIMA
annotators.  The core module would then be configured with a set of
key,value pairs that are provided to the AnalysisEngine as parameters,
and deployment would be a simple matter of dropping a pear in the
right place and then specifying an additional small section of core
config.

I now have this all working--and the generated PEARs can run
stand-alone too, which makes testing/debugging a good bit easier.  (we
can load them in the UIMA tools, for example.)--but it reeks.

What I've ended up doing is installing the pears programatically at
runtime (to simplfy deployment), but loading them as PEARs prevents
core from providing prameter values (I don't understand why, but
UIMA_IllegalStateExceptions abound if you try that).  Instead, we're
using the PackageBrowser returned by installing the pear to determine
the non-pear descriptor and the classpath/datapath.  After filtering
out the core dependency from the classpath, it goes to a
ResourceManager that can be used to load the annotator properly, and
all the code involved can see one definition of the Redaction class.

Thanks!
Rogan



> The use-case for Pears is to provide a shielded environment where the
> things in the PEAR can run with an independent classpath.  For example,
> a Pear component can define a JCas class called Token, which might have
> a different cover class than anyone else's Token.  While inside the
> PEAR, its Token JCas class would be used, while, outside the PEAR, other
> versions of this class might be used.  This is done on purpose.
>
> If you don't want this shielding behavior, you can get the non-shielding
> behavior by 1) installing the PEAR, 2) resolving any class path issues
> by hand, 3) setting up a common, appropriate class path for both the
> PEAR component(s) and the remaining components, and then 4) running with
> the normal descriptor for the component (not the Pear-specifier descriptor).
>
> -Marshall
>

Mime
View raw message