uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <richard.eck...@gmail.com>
Subject Re: Using uima pipeline as an API
Date Thu, 25 Jul 2013 06:21:45 GMT
Am 25.07.2013 um 04:15 schrieb swirl <swirlobt@yahoo.com>:
> Hi Richard,
> I was reading your reference for using JCasIterable 
> (https://code.google.com/p/dkpro-core-asl/wiki/GroovyRecipies#OpenNLP_Part-
> of-speech_tagging_pipeline_using_JCasIterable_and_c), but i have some 
> questions.
> Your example creates a JCasIterable using the following codes:
> def pipeline = new JCasIterable(
>  createReaderDescription(TextReader,
>    TextReader.PARAM_PATH, args[0],
>    TextReader.PARAM_LANGUAGE, args[1],
>    TextReader.PARAM_PATTERNS, ["[+]*.txt"]),
>  createEngineDescription(OpenNlpSegmenter),
>  createEngineDescription(OpenNlpPosTagger));
> I assume that createReaderDescription(), createEngineDescription() are 
> return CollectionReaderDescription and AnalysisEngineDescription 
> respectively. But when I looked at the constructor for JCasIterable, it only 
> accepts CollectionReader and AnalysisEngine array:
> JCasIterable(final CollectionReader aReader, final AnalysisEngine... 
> aEngines)
> Why is this so?

In the days of yore, uutuc/uimaFIT devs/users used instances (CollectionReader, AnalysisEngine)
more often. Later, we figured out that in those cases we had to take care of sending all the
life-cycle events (collectionProcessComplete, destroy) ourselves. It also had other potentially
unexpected effects, such as that a CollectionReader could not be re-used in several pipelines
because after the first pipeline was through, it would be "empty" (hasNext() = false).

Today, it is considered a best practice to stick to descriptors as long as possible and
instantiate only when necessary. If possible, leave instantiation to a runtime engine like
SimplePipeline or CPE.

In uimaFIT 1.4.0 the JCasIterable only accepts CollectionReader and AnalysisEngine….

In uimaFIT 2.0.0, this changes to CollectionReaderDescription and AnalaysisEngineDescription….

See also:

- UIMA-3041 [1] - JCasIterable should have signature accepting descriptors

- UIMA-3097 [2] - Split JCasIterable into iterable and iterator parts


-- Richard

[1] https://issues.apache.org/jira/browse/UIMA-3041

[2] https://issues.apache.org/jira/browse/UIMA-3097
View raw message