uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhik Lahiri <abhiklah...@gmail.com>
Subject newbie problem with aggregate AE - trying to combine WhitespaceTokenizer and HmmTagger
Date Sun, 19 Sep 2010 20:22:41 GMT
Hi all,
I am a newbie at the UIMA framework, and I am facing problems with running
the aggregate AE in the HmmTagger project put up on the UIMA sandbox. When I
load HmmtaggerAggregate.xml as the AE in CVD, I get the following
exception:  org.apache.resource.ResourceInitializationException: Error
initializing "org.apache.uima.resource.impl.Data_Resource_impl" from the
descriptor file:/C:Users/Abhik/workspace/Tagger/desc/HmmTagger.xml .
I am running my project in Eclipse.

I am pasting below the contents of the 4 .xml descriptor files in my
project. These are largely the same as the ones put up on the SVN server for
the HmmTagger code in UIMA sandbox:

HmmtaggerAggregate.xml:



<?*xml* version="1.0" encoding="UTF-8"?>

<analysisEngineDescription

*xmlns*="http://uima.apache.org/resourceSpecifier">

<frameworkImplementation>

org.apache.uima.java

</frameworkImplementation>

<primitive>false</primitive>

<delegateAnalysisEngineSpecifiers>

<delegateAnalysisEngine key="SimpleTokenAndSentenceAnnotator">

<import location="WhitespaceTokenizer.xml" />

</delegateAnalysisEngine>

<delegateAnalysisEngine key="HmmTagger">

<import location="HmmTagger.xml" />

</delegateAnalysisEngine>

</delegateAnalysisEngineSpecifiers>

<analysisEngineMetaData>

<name>HmmTaggerTAE</name>

<description />

<version />

<vendor />

<configurationParameters searchStrategy="language_fallback" />

<configurationParameterSettings />

<flowConstraints>

<fixedFlow>

<node>SimpleTokenAndSentenceAnnotator</node>

<node>HmmTagger</node>

</fixedFlow>

</flowConstraints>

<typePriorities />

<fsIndexCollection />

<capabilities />

<operationalProperties>

<modifiesCas>true</modifiesCas>

<multipleDeploymentAllowed>true</multipleDeploymentAllowed>

<outputsNewCASes>false</outputsNewCASes>

</operationalProperties>

</analysisEngineMetaData>

<resourceManagerConfiguration />

</analysisEngineDescription>

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This is the text in my WhitespaceTokenizer.xml file:



<?*xml* version="1.0" encoding="UTF-8"?>

<analysisEngineDescription *xmlns*="http://uima.apache.org/resourceSpecifier
">

<frameworkImplementation>org.apache.uima.java</frameworkImplementation>

<primitive>true</primitive>

<annotatorImplementationName>org.apache.uima.annotator.WhitespaceTokenizer</annotatorImplementationName>

<analysisEngineMetaData>

<name>WhitespaceTokenizer</name>

<description>creates token and sentence annotations for whitespace

separated languages</description>

<version>1.0</version>

<vendor>The *Apache* Software Foundation</vendor>

<configurationParameters>

<configurationParameter>

<name>SofaNames</name>

<description>The Sofa names the *annotator* should work on. If no

names are specified, the *annotator* works on the

default sofa.</description>

<type>String</type>

<multiValued>true</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

</configurationParameters>

<configurationParameterSettings/>

<typeSystemDescription>

<types>

<typeDescription>

<name>org.apache.uima.TokenAnnotation</name>

<description>Single token annotation</description>

<supertypeName>uima.tcas.Annotation</supertypeName>

<features>

<featureDescription>

<name>tokenType</name>

<description>token type</description>

<rangeTypeName>uima.cas.String</rangeTypeName>

</featureDescription>

<featureDescription>

<name>posTag</name>

<description/>

<rangeTypeName>uima.cas.String</rangeTypeName>

</featureDescription>

</features>

</typeDescription>

<typeDescription>

<name>org.apache.uima.SentenceAnnotation</name>

<description>sentence annotation</description>

<supertypeName>uima.tcas.Annotation</supertypeName>

</typeDescription>

</types>

</typeSystemDescription>

<fsIndexCollection/>

<capabilities>

<capability>

<inputs/>

<outputs>

<type>org.apache.uima.TokenAnnotation</type>

<feature>org.apache.uima.TokenAnnotation:tokentype</feature>

<type>org.apache.uima.SentenceAnnotation</type>

</outputs>

<languagesSupported>

<language>x-unspecified</language>

</languagesSupported>

</capability>

</capabilities>

<operationalProperties>

<modifiesCas>true</modifiesCas>

<multipleDeploymentAllowed>true</multipleDeploymentAllowed>

<outputsNewCASes>false</outputsNewCASes>

</operationalProperties>

</analysisEngineMetaData>

</analysisEngineDescription>

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This is the text in my HmmTagger.xml file :





<?*xml* version="1.0" encoding="UTF-8"?>

<analysisEngineDescription *xmlns*="http://uima.apache.org/resourceSpecifier
">

<frameworkImplementation>org.apache.uima.java</frameworkImplementation>

<primitive>true</primitive>

<annotatorImplementationName>org.apache.uima.examples.tagger.HMMTagger</annotatorImplementationName>

<analysisEngineMetaData>

<name>Hidden *Markov* Model - Part of Speech *Tagger*</name>

<description>A configuration of the HmmTaggerAnnotator that looks for

parts of speech of identified tokens within existing

Sentence and Token annotations. See also

WhitespaceTokenizer.xml.</description>

<version>1.0</version>

<vendor>The *Apache* Software Foundation</vendor>

<configurationParameters>

<configurationParameter>

<name>NGRAM_SIZE</name>

<type>Integer</type>

<multiValued>false</multiValued>

<mandatory>true</mandatory>

</configurationParameter>

</configurationParameters>

<configurationParameterSettings>

<nameValuePair>

<name>NGRAM_SIZE</name>

<value>

<integer>3</integer>

</value>

</nameValuePair>

</configurationParameterSettings>

<typeSystemDescription>

<types>

<typeDescription>

<name>org.apache.uima.TokenAnnotation</name>

<description>Single token annotation</description>

<supertypeName>uima.tcas.Annotation</supertypeName>

<features>

<featureDescription>

<name>posTag</name>

<description>contains part-of-speech of a

corresponding token</description>

<rangeTypeName>uima.cas.String</rangeTypeName>

</featureDescription>

</features>

</typeDescription>

<typeDescription>

<name>org.apache.uima.SentenceAnnotation</name>

<description>sentence annotation</description>

<supertypeName>uima.tcas.Annotation</supertypeName>

</typeDescription>

</types>

</typeSystemDescription>

<typePriorities/>

<fsIndexCollection/>

<capabilities>

<capability>

<inputs>

<type>org.apache.uima.TokenAnnotation</type>

<type allAnnotatorFeatures="true">org.apache.uima.SentenceAnnotation</type>

<feature>org.apache.uima.TokenAnnotation:end</feature>

<feature>org.apache.uima.TokenAnnotation:begin</feature>

</inputs>

<outputs>

<type>org.apache.uima.TokenAnnotation</type>

<feature>org.apache.uima.TokenAnnotation:posTag</feature>

<feature>org.apache.uima.TokenAnnotation:end</feature>

<feature>org.apache.uima.TokenAnnotation:begin</feature>

</outputs>

<languagesSupported/>

</capability>

</capabilities>

<operationalProperties>

<modifiesCas>true</modifiesCas>

<multipleDeploymentAllowed>true</multipleDeploymentAllowed>

<outputsNewCASes>false</outputsNewCASes>

</operationalProperties>

</analysisEngineMetaData>

<externalResourceDependencies>

<externalResourceDependency>

<key>Model</key>

<description>HMM *Tagger* model file</description>

<interfaceName>org.apache.uima.examples.tagger.IModelResource</interfaceName>

<optional>false</optional>

</externalResourceDependency>

</externalResourceDependencies>

<resourceManagerConfiguration>

<externalResources>

<externalResource>

<name>ModelFile</name>

<description>HMM *Tagger* model file</description>

<fileResourceSpecifier>

<fileUrl>file:english/BrownModel.dat</fileUrl>

</fileResourceSpecifier>

<implementationName>org.apache.uima.examples.tagger.ModelResource</implementationName>

</externalResource>

</externalResources>

<externalResourceBindings>

<externalResourceBinding>

<key>Model</key>

<resourceName>ModelFile</resourceName>

</externalResourceBinding>

</externalResourceBindings>

</resourceManagerConfiguration>

</analysisEngineDescription>

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



And finally, this is the text in my HmmModelTrainer.xml file:





<?*xml* version="1.0" encoding="UTF-8"?>

<analysisEngineDescription *xmlns*="http://uima.apache.org/resourceSpecifier
">

<frameworkImplementation>org.apache.uima.java</frameworkImplementation>

<primitive>true</primitive>

<annotatorImplementationName>org.apache.uima.examples.tagger.HMMModelTrainer</annotatorImplementationName>

<analysisEngineMetaData>

<name>HMMModelTrainer</name>

<description>This analysis engine trains an N-gram model for the HMM *tagger
*. It uses a training corpus as reference. This corpus must contain
annotations on words with an attribute corresponding of the POS value to be
learned.

The configuration of this analysis engine is done through several
parameters:

&lt;*ul*&gt;

&lt;*li*&gt;View: - the view from which the tokens will be extracted&lt;/*li
*&gt;

&lt;*li*&gt;ModelExportFile: - the path where the model will be written&lt;/
*li*&gt;

&lt;*li*&gt;FeaturePathPOS: - feature path to the value of the POS to be
learned. The annotation should exactly cover a "word".&lt;/*li*&gt;

&lt;/*ul*&gt;

&lt;b&gt;BEWARE: this analysis engine does not allow multiple deployment
!&lt;/b&gt;

&lt;i&gt;NB. At the moment: both *bi* and *trigram* statistics are saved in
one model file.&lt;/i&gt;</description>

<version>1.0</version>

<vendor/>

<configurationParameters>

<configurationParameter>

<name>View</name>

<description>The view from which the tokens will be extracted.</description>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>true</mandatory>

</configurationParameter>

<configurationParameter>

<name>ModelExportFile</name>

<description>The path where the model will be written.</description>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>true</mandatory>

</configurationParameter>

<configurationParameter>

<name>FeaturePathPOS</name>

<description>Feature path to the value of the POS to be *learnt*. The
annotation should exactly cover a "word".</description>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>true</mandatory>

</configurationParameter>

</configurationParameters>

<configurationParameterSettings>

<nameValuePair>

<name>View</name>

<value>

<string>_InitialView</string>

</value>

</nameValuePair>

<nameValuePair>

<name>ModelExportFile</name>

<value>

<string>hmmtagger_model.dat</string>

</value>

</nameValuePair>

<nameValuePair>

<name>FeaturePathPOS</name>

<value>

<string>org.apache.uima.TokenAnnotation:posTag</string>

</value>

</nameValuePair>

</configurationParameterSettings>

<typeSystemDescription/>

<typePriorities/>

<fsIndexCollection/>

<capabilities>

<capability>

<inputs/>

<outputs/>

<languagesSupported/>

</capability>

</capabilities>

<operationalProperties>

<modifiesCas>false</modifiesCas>

<multipleDeploymentAllowed>false</multipleDeploymentAllowed>

<outputsNewCASes>false</outputsNewCASes>

</operationalProperties>

</analysisEngineMetaData>

<resourceManagerConfiguration/>

</analysisEngineDescription>



I understand that I have ended up writing a huge mail as a query, but I am
an absolute newbie to the UIMA framework and shall be extremely grateful to
anyone who can help me out here.

Thanks a lot for your help!

Regards,

Abhik

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message