uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: uima-fit and uima annotators (in my case Whitespace annotator)
Date Wed, 29 Jan 2014 17:23:03 GMT
See comments inline. I've removed those parts that do not seem to require
further discussion.

On 29.01.2014, at 17:32, Luca Foppiano <luca@foppiano.org> wrote:
>> - the type systems of all components in a pipeline is automatically merged
>> when a pipeline is run (e.g. using SimplePipeline.runPipeline). Thus, it
>> would also work to pass a TSD with all types used in the pipeline only to
>> the reader, but not to any of the subsequent components.
> Ok, that's an important point in fact.
> Do you know if the order (if it is passed to the first or last component)
> does matters?

It does not matter. All type information from all components are merged
together and used to initialize the CASes which are passed through all
of the components.

>> - alternatively, it is possible to have uimaFIT automatically detect your
>> types [1]. If you do that, there is no need at all to pass the TSD to the
>> component - it happens automatically.
>>  createEngineDescription(SimpleCC.class,
>>    SimpleCC.PARAM_OUTPUT_DIR, "…");
> OK. Do you have an example/use case of when the TSD should be passed to the
> engine? Perhaps when the type system is loaded by manually fetching the
> information or reading the descriptor programmatically?

It needs to be passed when you do not make use of uimaFIT's feature for
auto-detecting the type descriptors.

A case where you may want to do this is, when you want extra control over
the types you pass in, e.g.:

- when you want to pass only a subset of the types that would be auto-discovered
- when you programmatically generate or modify a type system
- if there is reason that you cannot use auto-discovery (possibly in OSGi environments)

>> - if you want to retrieve annotation from the CAS without using the JCas
>> wrappers, you can have a look at the CasUtil class. E.g.
>>  CasUtil.select(cas, CasUtil.getType(cas, "my.package.name.MyType"))
>> Mind, this call works only if "MyType" inherits from the built-in
>> "Annotation" type. Otherwise, you would use "selectFS" instead of "select".
>> I would recommend using the CAS/CasUtil only if you want to implement a
>> generic component that can be configured to work with different types. If
>> your component is fixed to a certain type system, then using the
>> JCas/JCasUtil is much more convenient.
> OK, that's definitely helpful, but I still have a bit of confusion in my
> head between JCas and CAS.
> In my example I could use JCAs, the problem is that the JCASUtils.select()
> method require the Class Type system
> [...] select([...],  *final Class<T> type*) [...]
> while the Cas/CasUtil select() method takes the type defined as Type. Is
> there a reason for this difference? I might have missed/forgotten something
> or some part of the documentation

The JCas maps the UIMA type system to the Java type system. The CAS is one level
below that. Some people that first learned JCas, later try to use Java reflection
to dynamically create a annotation of a certain type based on a type name passed
to the component as a parameter. If you ever think about using reflection on 
JCas types, you should instead use the CAS interface.

There may be reasons like performance or memory usage that favor one over the other
interface - however, I personally did not make any extensive evaluations on that.
I generally favor convenient programming over pre-mature optimizations. So far,
JCas vs. CAS didn't seem to be a problem for me.


-- Richard

View raw message