uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bonnie MacKellar <bkmackel...@gmail.com>
Subject best practice for building RUTA scripts in Eclipse when they are to be run in Java?
Date Thu, 14 Jan 2016 15:13:24 GMT

I just spent the last 4 days stumbling through the documentation,
tutorials, posts to this mailing list, and any code examples I could find
on the Internet, so I could integrate the Metamap annotator and a RUTA
script in Java using UimaFit. I succeeded, and I have something that runs,
but I doubt I am organizing things the best way in Eclipse, and in
particular, I am noticing some odd things if I try to build and test the
script first in the Ruta development environment in Eclipse and then move
the script to my Java environment. I suspect my workflow is not the best
possible, so I am looking for advice on how to manage this.

My project was created as a Ruta project so I could have the development
environment support. I then added Uima nature to the project to get the
Java development folders. I set up the type descriptors for Metamap, and
after much reading, realized I needed a types.txt file in my source folder
that tells the system how to find the Metamap type descriptors. I then
added the Ruta script to the pipeline in my Java class and then copied the
type descriptor for that down to my source folders as well. Finally, I
realized I needed java classes for the types, and that pressing a jCasGen
button in the ComponentDescriptorEditor was the way to do that. However,
there are some anomalies when I do this.

So, my project has this structure at the top level

[image: Inline image 1]

and at the src level, this is the structure. Notice that the Ruta script
and types have been copied down to this level

[image: Inline image 2]

The code that creates the AnalysisEngineDescriptors and runs the pipeline
looks like this (it is in PipelineSystem. java)

try {
ae =
AnalysisEngineDescription mmEngineDesc =

AnalysisEngine rae = AnalysisEngineFactory.createEngine(RutaEngine.class,
AnalysisEngineDescription rutaEngineDesc =
JCas jCas = ae.newJCas();
jCas.setDocumentText("serum albumin greater or equal 2g/dL");
SimplePipeline.runPipeline(jCas, mmEngineDesc, rutaEngineDesc);

and the types.txt file contains this

If I want to use the Ruta Workbench to develop my Ruta script, it appears
that I have to regenerate the java type files, such as Relational.java,
each time I make a change. Is that correct?
And when I do this, I notice that it completely regenerates the
org.apache.uima.ruta.type hierarchy, which leads to an odd runtime error
 (NoSuchMethodException, caused by trying to call setLowMemoryProfile). I
read a chain on this list about this error which recommended to delete the
regenerated uima type hierachy. This worked, but it seems I have to go
through these steps every time I regenerate the Ruta types, which is a pain.

Also, I notice that the metamap type hierarchy is also regenerated inside
my project. I theorize it is because of the import in my Ruta type
TYPESYSTEM BasicTypeSystem;
TYPESYSTEM BasicMetaMapTypeSystem;
TYPESYSTEM MetaMapApiTypeSystem;
DECLARE Relational,UMLSConcept;
Candidate{ -> MARK(UMLSConcept)};

is this not the right way to make my script aware of the Metamap types?

I also notice that in the type descriptor, this import is generated twice
        <import location="BasicTypeSystem.xml"/>
        <import location="BasicTypeSystem.xml"/>

In general, is it a good or bad idea to develop the Ruta script in the
workbench and then copy its pieces into the Java source folder? It seems
like a very convoluted process.

Thanks for your help

Bonnie MacKellar

  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message