uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: uima-fit and uima annotators (in my case Whitespace annotator)
Date Thu, 23 Jan 2014 14:13:30 GMT

can you provide the full code for your sample pipeline? I think that would make it easier
to help.

With the present information, I can only give some general advice.

- it is not mandatory to have the type system java classes (JCas wrappers) present in a project
if none of your components (Readers, AEs, CCs) use them.

- it is possible to manually load a type system description (TSD) and pass it to the components.
But then the TSD is the second argument to the createXXXDescription call, e.g.

  createEngineDescription(SimpleCC.class, tsd, 
    SimpleCC.PARAM_OUTPUT_DIR, "…");

- the type systems of all components in a pipeline is automatically merged when a pipeline
is run (e.g. using SimplePipeline.runPipeline). Thus, it would also work to pass a TSD with
all types used in the pipeline only to the reader, but not to any of the subsequent components.

- alternatively, it is possible to have uimaFIT automatically detect your types [1]. If you
do that, there is no need at all to pass the TSD to the component - it happens automatically.

    SimpleCC.PARAM_OUTPUT_DIR, "…");

- if you want to retrieve annotation from the CAS without using the JCas wrappers, you can
have a look at the CasUtil class. E.g.

  CasUtil.select(cas, CasUtil.getType(cas, "my.package.name.MyType"))

Mind, this call works only if "MyType" inherits from the built-in "Annotation" type. Otherwise,
you would use "selectFS" instead of "select".

I would recommend using the CAS/CasUtil only if you want to implement a generic component
that can be configured to work with different types. If your component is fixed to a certain
type system, then using the JCas/JCasUtil is much more convenient.

-- Richard

[1] http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.typesystem

On 23.01.2014, at 06:21, Luca Foppiano <luca@foppiano.org> wrote:

> Hi Everybody,
>    I'm starting playing with uima-fit and I'm trying to integrate the
> whitespace annotator into my simple pipeline composed by a collection
> reader a simple AE (plays with the text, doesn't annotate) and I want to
> add a whitespace annotator to be applied to the text.
> I've download the trunk version of the Whitespace annotator on github, I've
> extracted the type system definition from the descriptor XML and referenced
> it from uimafit. The pipeline worked without crashing.
> Now I want to add an AE that takes the annotations and do something with
> that (print them for example).
> I could not find a way to work around the fact the type system java class
> were not present in the project, is this a mandatory requirement?
> What I've tried is to do something like:
> //Get the type autogeneated type system (SentenceAnnotation,
> TokenAnnotation)
> TypeDescription[] types = tsd.getTypes();
> [...]
> //..and try to pass them to my annotator
>        AnalysisEngineDescription casConsumer =
> AnalysisEngineFactory.createEngineDescription(SimpleCC.class,
>                SimpleCC.OUTPUT_DIR_PARAM,
>                "/home/lf84914/development/epo/apl/data/out",
> *                types, null*);
> but then, in the AE's code, I have no idea how to use them.
> Any suggestions?
> Thank everybody in advance.
> -- 
> Luca Foppiano
> Software Engineer
> +31615253280
> luca@foppiano.org
> www.foppiano.org

View raw message