uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bonnie MacKellar <bkmackel...@gmail.com>
Subject Re: problems integrating Ruta and uimaFit
Date Fri, 24 Jun 2016 00:55:14 GMT
Hi,
I just wanted to say thanks - your description gave me enough clues that I
finally got this to work. I think I have some questions, though, about WHY
certain things work, but since I am preparing to go out of town, I will
wait on those. I need to understand what I did better so I can configure
these things faster in the future.

thanks,
Bonnie MacKellar

On Thu, Jun 23, 2016 at 5:07 AM, Peter Kl├╝gl <peter.kluegl@averbis.com>
wrote:

> Hi,
>
>
> sorry, here's just a short reply since  I am currently travelling. If
> the problem still exists I will try to reproduce it and reply with more
> details next week.
>
>
> Yes, in simple UIMA Ruta projects, these descriptors are copied to
> descriptor/utils when you create the project. The descriptor folder is
> listed in the buildpath as a "descriptor" folder, where imported
> descriptors are searched in.
>
> UIMA Ruta supports currently two ways to find the descriptors: the
> absolute paths specified in the descriptorPaths configuration parameter
> and the classpath. Thus, the simplest way for you would be to use the
> classpath to find the descriptor instead of the descriptorPaths (which
> points to the descriptor folder of your ruta project).
>
> Changing the imports to something like: UIMAFIT
> org.apache.uima.ruta.engine.PlainTextAnnotator should do the trick (you
> need also to adapt the TYPESYSTEM import). Then the script does not
> depend on the project structure.
>
>
> If you use the SourceDocumentInformation type system in your ruta
> script, then you need to include it separately. In some situtation, the
> Ruta Workbench does that automatically for you. However, it is not
> mentioned in types.txt in ruta-core. So you need to add it there in your
> maven project so that the typesystem scanning of uimaFIT finds it.
>
>
> If you create the analysis engine (descriptor) for a ruta script
> programmatically, there are sometimes additional configuration
> parameters that need to be set. In your use case, you import additional
> analysis engine in your script. These need to be mentioned in the
> corresponding configuration parameters, e.g., PARAM_ADDITIONAL_ENGINES
> or PARAM_ADDITIONAL_UIMAFIT_ENGINES. Since there are several parameters
> that are rather technical. I normally use the generated descriptor in
> the uimaFIT factory.
>
>
> Best,
>
>
> Peter
>
>
> Am 22.06.2016 um 21:55 schrieb Bonnie MacKellar:
> > I am still trying to figure out how to count Ruta annotations across a
> > bunch of input files. There doesn't seem to be any Workbench way to do
> it.
> > So now I am trying to call Ruta from UimaFit so I can do the job in Java.
> >
> > However, I am having serious configuration problems, plus I have a
> question
> > on how do bring in PlainTextAnnotator.
> >
> > I am using Maven, with the jcasgen-maven-plugin, the ruta-maven-plugin,
> and
> > the uimafit-maven-plugin. I will include the pom file at the end of this
> > post.
> >
> > I want my Java code to be aware of the types declared in the Ruta script
> -
> > that is the whole point - I want to count those annotations.
> >
> > My Ruta script also uses PlainTextAnnotator. The problem with this is
> that
> > I can't figure out where to put it. In a Workbench based Ruta project,
> > PlainTextAnnotator.xml and PlainTextAnnotatorTypeSystem get put
> > automatically into descriptor/utils, along with a number of other
> > descriptors that seem to be built into Ruta. But when I create a project
> > using maven, there is no such location, and these descriptors do not get
> > put anywhere. I tried a number of places but could not get my script to
> see
> > the type system for PlainTextAnnotator. Finally, I hit on putting the
> files
> > in target/generated-sources/ruta/descriptor/utils, and finally my script
> is
> > able to see the types and I can run it. This is good because at that
> point,
> > the ruta-maven-plugin does its job and generates the descriptors for my
> > script. However, I suspect this is not a good place to put the
> > PlainTextAnnotator files since doing a clean overwrites them. Where
> should
> > they go? Is there any entry in the pom file that is needed?
> >
> > The second problem is that although my Ruta script works nicely on its
> own,
> > the Java code fails.  I get the following exception
> > Exception in thread "main" org.apache.uima.cas.CASRuntimeException: JCas
> > type "org.apache.uima.examples.SourceDocumentInformation" used in Java
> > code,  but was not declared in the XML type descriptor.
> > at org.apache.uima.jcas.impl.JCasImpl.getTypeInit(JCasImpl.java:435)
> > at org.apache.uima.jcas.impl.JCasImpl.getType(JCasImpl.java:408)
> > at org.apache.uima.jcas.cas.TOP.<init>(TOP.java:96)
> > at org.apache.uima.jcas.cas.AnnotationBase.<init>(AnnotationBase.java:66)
> > at org.apache.uima.jcas.tcas.Annotation.<init>(Annotation.java:54)
> > at
> >
> org.apache.uima.examples.SourceDocumentInformation.<init>(SourceDocumentInformation.java:80)
> > at
> >
> org.apache.uima.examples.cpe.FileSystemCollectionReader.getNext(FileSystemCollectionReader.java:162)
> > at
> >
> org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:149)
> > at PipelineSystem.<init>(PipelineSystem.java:59)
> > at PipelineSystem.main(PipelineSystem.java:73)
> >
> > I am guessing that I need to put some other descriptor somewhere but I
> > can't figure out what it might be.  Here is the code that causes the
> problem
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > import java.io.IOException;
> > import java.util.Iterator;
> >
> > import org.apache.uima.UIMAException;
> > import org.apache.uima.analysis_engine.AnalysisEngine;
> > import org.apache.uima.analysis_engine.AnalysisEngineDescription;
> > import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
> > import org.apache.uima.cas.Type;
> > import org.apache.uima.cas.TypeSystem;
> > import org.apache.uima.collection.CollectionReaderDescription;
> > import org.apache.uima.examples.cpe.FileSystemCollectionReader;
> > import org.apache.uima.fit.component.CasDumpWriter;
> > import org.apache.uima.fit.factory.AnalysisEngineFactory;
> > import org.apache.uima.fit.factory.CollectionReaderFactory;
> > import org.apache.uima.fit.pipeline.SimplePipeline;
> > import org.apache.uima.jcas.JCas;
> > import org.apache.uima.resource.ResourceInitializationException;
> > import org.apache.uima.ruta.engine.RutaEngine;
> >
> > public class PipelineSystem  {
> > public PipelineSystem() throws IOException, UIMAException
> > {
> > try {
> > CollectionReaderDescription readerDesc =
> > CollectionReaderFactory.createReaderDescription(
> > FileSystemCollectionReader.class,
> >            FileSystemCollectionReader.PARAM_INPUTDIR,
> >  "/home/bonnie/Research/eclipse-uima-projects/PipeLineWithRuta/input",
> >            FileSystemCollectionReader.PARAM_ENCODING,  "UTF-8",
> >            FileSystemCollectionReader.PARAM_LANGUAGE,  "English");
> > AnalysisEngine rae = AnalysisEngineFactory.createEngine(RutaEngine.class,
> > RutaEngine.PARAM_MAIN_SCRIPT,
> >            "ecClassifierRules");
> > AnalysisEngineDescription rutaEngineDesc =
> > AnalysisEngineFactory.createEngineDescription(RutaEngine.class,
> > RutaEngine.PARAM_MAIN_SCRIPT,
> >            "ecClassifierRules");
> > AnalysisEngineDescription writerDesc =
> > AnalysisEngineFactory.createEngineDescription(CasDumpWriter.class,
> > CasDumpWriter.PARAM_OUTPUT_FILE, "dump.txt");
> > JCas jCas = rae.newJCas();
> > SimplePipeline.runPipeline(readerDesc, rutaEngineDesc);
> > displayRutaResults(jCas);
> > } catch (ResourceInitializationException e) {
> > // TODO Auto-generated catch block
> > e.printStackTrace();
> > } catch (AnalysisEngineProcessException e) {
> > // TODO Auto-generated catch block
> > e.printStackTrace();
> > }
> > }
> >
> > public static void main(String[] args) throws IOException,
> UIMAException  {
> > PipelineSystem p = new PipelineSystem();
> >
> > }
> >
> > public void displayRutaResults(JCas jCas)
> > {
> > System.out.println("in display ruta results");
> > TypeSystem ts = jCas.getTypeSystem();
> > Iterator<Type> typeItr = ts.getTypeIterator();
> > while (typeItr.hasNext()) {
> > Type type = (Type) typeItr.next();
> > if (type.getName().equals("INCL")) {
> > System.out.println("INCL was found");
> > }
> > }
> > }
> >
> ------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > Yes, I know the code doesn't actually count annotations yet - this is
> > strictly a test of the configuration. The type INCL is declared in the
> > script
> >
> > ENGINE utils.PlainTextAnnotator; TYPESYSTEM utils.PlainTextTypeSystem;
> > Document{-> RETAINTYPE(BREAK)}; Document{-> EXEC(PlainTextAnnotator,
> > {Line})};
> >
> > DECLARE INCL; "INCLUSION" -> INCL;
> >
> > And finally, here is the pom file. I note that the ruta pugin and the
> > jcasegen plugin are correctly generating the descriptor files for the
> > script and the Java classes for the types. I have this set up so that the
> > jcasgen plugin reads the type descriptors from the folder that is
> generated
> > by the ruta-maven-plugin (I saw this in one of the examples mentioned
> > elsewhere on this mailing lsit)
> > However, the uimafit plugin does not generate anything.
> >
> > thanks for any help. It is really hard to figure out all these moving
> parts.
> >
> > Bonnie MacKellar
> >
> >
> ---------------------------------------------------------------------------------------------------------------------------------
> >
> > <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="
> > http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="
> > http://maven.apache.org/POM/4.0.0
> > http://maven.apache.org/xsd/maven-4.0.0.xsd">
> > <modelVersion>4.0.0</modelVersion> <groupId>PipeLineWithRuta</groupId>
> > <artifactId>PipeLineWithRuta</artifactId>
> <version>0.0.1-SNAPSHOT</version>
> > <packaging>jar</packaging> <name>PipeLineWithRuta</name>
<url>
> > http://maven.apache.org</url> <properties>
> > <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
> > </properties> <build> <sourceDirectory>src/main/java</sourceDirectory>
> > <resources> <resource> <directory>src/main/ruta</directory>
</resource>
> > <resource> <directory>src/desc</directory> </resource> </resources>
> > <plugins> <plugin> <artifactId>maven-compiler-plugin</artifactId>
> > <version>3.3</version> <configuration> <source>1.8</source>
> > <target>1.8</target> </configuration> </plugin> <plugin>
> > <groupId>org.apache.uima</groupId>
> > <artifactId>jcasgen-maven-plugin</artifactId> <version>2.4.1</version>
> <!--
> > change this to the latest version --> <executions> <execution> <goals>
> > <goal>generate</goal> </goals> <!-- this is the only goal -->
<!-- runs
> in
> > phase process-resources by default --> <configuration> <!-- REQUIRED
-->
> > <typeSystemIncludes> <!-- one or more ant-like file patterns identifying
> > top level descriptors -->
> >
> <typeSystemInclude>target/generated-sources/ruta/descriptor/ecClassifierRulesTypeSystem.xml</typeSystemInclude>
> > </typeSystemIncludes> <!-- OPTIONAL --> <!-- a sequence of ant-like
file
> > patterns to exclude from the above include list --> <typeSystemExcludes>
> > </typeSystemExcludes> <!-- OPTIONAL --> <!-- where the generated
files go
> > --> <!-- default value:
> > ${project.build.directory}/generated-sources/jcasgen" -->
> <outputDirectory>
> > </outputDirectory> <!-- true or false, default = false --> <!-- if
true,
> > then although the complete merged type system will be created internally,
> > only those types whose definition is contained within this maven project
> > will be generated. The others will be presumed to be available via other
> > projects. --> <!-- OPTIONAL --> <limitToProject>true</limitToProject>
> > </configuration> </execution> </executions> </plugin> <plugin>
> > <groupId>org.apache.uima</groupId>
> > <artifactId>ruta-maven-plugin</artifactId> <version>2.3.1</version>
> > <configuration> <scriptPaths> <scriptPath>src/main/ruta/</scriptPath>
> > </scriptPaths> <!-- Descriptor paths of the generated analysis engine
> > descriptor. --> <!-- default value: none --> <descriptorPaths>
> >
> <descriptorPath>${project.build.directory}/generated-sources/ruta/descriptor</descriptorPath>
> > </descriptorPaths> <!-- Resource paths of the generated analysis engine
> > descriptor. --> <!-- default value: none --> <resourcePaths>
> > <resourcePath>${project.build.directory}/generated-sources/ruta/
> > resources/</resourcePath> </resourcePaths>
> > <analysisEngineSuffix>Engine</analysisEngineSuffix>
> > <typeSystemSuffix>TypeSystem</typeSystemSuffix> <!-- Type of type
system
> > imports. false = import by location. --> <!-- default value: false -->
> > <importByName>false</importByName> <!-- Option to resolve imports
while
> > building. --> <!-- default value: false -->
> > <resolveImports>false</resolveImports> <!-- List of packages with
> language
> > extensions --> <!-- default value: none --> <extensionPackages>
> > <extensionPackage>org.apache.uima.ruta</extensionPackage>
> > </extensionPackages> <!-- Add UIMA Ruta nature to .project --> <!--
> default
> > value: false --> <addRutaNature>true</addRutaNature> <!-- Buildpath
of
> the
> > UIMA Ruta Workbench (IDE) for this project --> <!-- default value: none
> -->
> > <buildPaths> <buildPath>script:src/main/ruta/</buildPath>
> > <buildPath>descriptor:target/generated-sources/ruta/descriptor/
> > </buildPath> <buildPath>resources:src/main/resources/</buildPath>
> > </buildPaths> </configuration> <executions> <execution>
<id>default</id>
> > <phase>process-classes</phase> <goals> <goal>generate</goal>
</goals>
> > </execution> </executions> </plugin> <plugin>
> > <groupId>org.apache.uima</groupId>
> > <artifactId>uimafit-maven-plugin</artifactId> <version>2.2.0</version>
> <!--
> > change to latest version --> <configuration> <!-- OPTIONAL --> <!--
Path
> > where the generated resources are written. --> <outputDirectory>
> > ${project.build.directory}/generated-sources/uimafit </outputDirectory>
> > <!-- OPTIONAL --> <!-- Skip generation of
> > META-INF/org.apache.uima.fit/components.txt -->
> > <skipComponentsManifest>false</skipComponentsManifest> <!-- OPTIONAL
-->
> > <!-- Source file encoding. -->
> > <encoding>${project.build.sourceEncoding}</encoding> </configuration>
> > <executions> <execution> <id>default</id> <phase>process-classes</phase>
> > <goals> <goal>generate</goal> </goals> </execution>
</executions>
> </plugin>
> > </plugins> </build> <dependencies> <dependency>
> > <groupId>org.apache.uima</groupId> <artifactId>uimafit-core</artifactId>
> > <version>2.2.0</version> </dependency> <dependency>
> > <groupId>org.apache.uima</groupId> <artifactId>uimaj-core</artifactId>
> > <version>2.8.1</version> </dependency> <dependency>
> > <groupId>org.apache.uima</groupId>
> > <artifactId>ruta-maven-plugin</artifactId> <version>2.3.1</version>
> > </dependency> <dependency> <groupId>org.apache.uima</groupId>
> > <artifactId>uimaj-cpe</artifactId> <version>2.8.1</version>
</dependency>
> > <dependency> <groupId>org.apache.uima</groupId>
> > <artifactId>uimaj-examples</artifactId> <version>2.8.1</version>
> > </dependency> </dependencies> </project>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message