uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: Finding Tokenizer references in UIMA Project
Date Fri, 25 Sep 2015 10:50:12 GMT
On 24.09.2015, at 08:41, Kahini Wadhawan <Kahini.Wadhawan@Colorado.EDU> wrote:

> Hi,
> 
> I am working on something which is based on a paper and it is mentioned in
> the paper that they have used UIMA default tokenizer. This is a confusing
> to me as apparently there is no single default tokenizer. So, it would be
> great if I can get some insight on some things:
> 
> 1) How can I find default versions of the UIMA-native tokenizer shipped
> with UIMA v 2.2.2 ?

The only "default"/"native" tokenizers in UIMA that I would be aware of 
are the SimpleTokenAndSentenceAnnotator from the UIMA SDK examples the
WhitespaceTokenizer from the UIMA Addons package.

I would assume the reference is probably to the SimpleTokenAndSentenceAnnotator
which is based on the Java BreakIterator.

> 2) There is UIMA project that I am exploring the code of. I want to find
> which tokenizer they have referenced. There are no xml descriptors in the
> code. Is it possible to have a uima project without xml descriptors? Where
> else I can get the tokenizer information in that project?

It is possible to use UIMA without XML descriptors, if you create resource
specifiers through directly through the UIMA framework - an alternative is
uimaFIT which wraps the native UIMA API and provides a more convenient API
for programmatically handling UIMA components and workflows.

One option would be to use your IDE to search for all subclasses of 
the UIMA AnalysisComponent interface (e.g. in Eclipse called
'Open Type Hierarchy').

Cheers,

-- Richard
Mime
View raw message