uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roberto Franchini" <ro.franch...@gmail.com>
Subject Spring factoryBean for producing AE: processors, consumer, readers and PEAR
Date Wed, 09 Jul 2008 15:10:05 GMT
Hi,
I wrote some components usefull for integrate UIMA-components inside a
Spring framework.
This components are Spring FactoryBeans that are able to produce
CasProcessors/Consumers , CollectionReaders and type systems.
The production can be made "totally programmatically", from descriptor
or a PEAR.
I want to release this components to the community, if it sounds good.
This works starts over code posted by Steven Bethard on this ml.
Thank a lot Steven!

I give some use's examples:

<!-- collection reader -->
	<bean name="cr" class="it.celi.uima.bean.CollectionReaderFactoryBean"
parent="baseAnnotator">
		<property name="componentClass"
value="it.celi.components.collection.RecursiveFileSytemCollectionReader"
/>
		<property name="configurationParameters">
			<map>
				<entry key="application" value="language" />
				<entry key="language" value="it" />
			</map>
		</property>
	</bean>

where baseAnnotator is:
	<bean name="baseAnnotator"
class="it.celi.uima.bean.AbstractUIMAComponentsFactoryBean"
abstract="true">
		<property name="typeSystem" ref="typeSystem" />
	</bean>

	<bean name="typeSystem" class="it.celi.uima.bean.TypeSytemFactoryBean">
		<property name="typeSytemPath"
value="file:../dd4-typeSystem/src/main/resources/CeliTypeSystem.xml"
/>
	</bean>
	

Processor/consumers:

	<bean name="sentenceAnnotator"
class="it.celi.uima.bean.CasProcessorFactoryBean"
parent="baseAnnotator">
		<property name="componentClass"
value="it.celi.annotators.language.SentenceAnnotator" />
		<property name="configurationParameters">
			<map>
				<entry key="abbreviationsFiles" value="abbreviations_*.txt" />
				<entry key="additionalSeparatorsFiles" value="sentenceSeparators_*.txt" />
			</map>
		</property>
	</bean>

	<bean name="xslSerializerCasConsumer"
class="it.celi.uima.bean.CasConsumerFactoryBean"
parent="baseAnnotator">
		<property name="componentClass"
value="it.celi.components.consumer.XslSerializerCasConsumer" />
		<property name="configurationParameters">
			<map>
				<entry key="fileExtension" value=".xml" />
			</map>
		</property>
	</bean>


PEAR files (configuraiton parameters override is not allowed!):

	<bean name="japeAnnotator" class="it.celi.uima.bean.CasProcessorFactoryBean">
		<property name="descriptorPath" value="file:./pears/JapeAnnotator.pear" />
		<property name="redeployPear" value="true"/>

		<property name="configurationParameters">
			<map>
			</map>
		</property>
	</bean>

from descriptor with params override:

	<bean name="japeAnnotator" class="it.celi.uima.bean.CasProcessorFactoryBean">
		<property name="descriptorPath" value="file:./desc/RegExpTokenizer.xml" />
		<property name="configurationParameters">
			<map>
				<entry key="commandsFileName" value="commands_tokenizer_*.xml" />
			</map>
		</property>
	</bean>


A simple use case coul be:

Configuration:

<bean name="cpm" class="org.apache.uima.UIMAFramework"
factory-method="newCollectionProcessingManager">

</bean>

	<bean name="uimaCPM" class="it.celi.uima.engine.CpmUIMAEngine">
		<property name="cpm" ref="cpm" />
		<property name="listeners">
		</property>
		<property name="readers">

			<list>
				<ref bean="rfcr" />
			</list>
		</property>
		<property name="processors">
			<list>
				<ref bean="sentenceAnnotator" />
				<ref bean="regExpTokenizer" />
				<ref bean="japeAnnotator" />

			</list>
		</property>
		<property name="consumers">
			<list>
				<ref bean="xslSerializerCasConsumer" />
			</list>
		</property>
	</bean>


The last element is a CPMWrapper that inside do this:

Methods to add consumers and processors to cpm (lists are injected by
conf above):

	private void addAllConsumersToCpm() {
		for (CasConsumer casConsumer : consumers) {
			String name = casConsumer.getProcessingResourceMetaData().getName();
			try {
				logger.info("adding consumer to pipeline::" + name);
				cpm.addCasConsumer(casConsumer);

			} catch (ResourceConfigurationException e) {

				logger.error("unable to add processor  :: " + name, e);
			}
		}

	}

	private void addAllProcessorToCpm() {
		for (CasProcessor casProcessor : processors) {
			String name = casProcessor.getProcessingResourceMetaData().getName();

			try {
				logger.info("adding processor to pipeline::" + name);
				cpm.addCasProcessor(casProcessor);
			} catch (ResourceConfigurationException e) {
				logger.error("unable to add processor  :: " + name, e);
			}
		}

	}

and then in a method can do:

			cpm.setCollectionReader(reader);
			cpm.process();


Some advantage:
-only one simple file to configure a cpm
-easy to inject components
-easy to embed cpm/AE inside existing applications
-can use SpringIDE inside Eclipse
-....whatever?
Disadvantage:
-if you don't use Spring, there's another framework to learn
-you can't use the Eclipse's UIMA plugins to edit/manage descriptors
-Aggregate are not supported programmatically (via descriptors there's
no problem)
-....whatever?

Is it interesting? Let me now.

Roberto
-- 
Roberto Franchini
http://www.celi.it
http://www.blogmeter.it
http://www.memesphere.it
Tel +39-011-6600814
jabber:ro.franchini@gmail.com skype:ro.franchini

Mime
View raw message