uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bonnie MacKellar <bkmackel...@gmail.com>
Subject Re: best practice for building RUTA scripts in Eclipse when they are to be run in Java?
Date Thu, 14 Jan 2016 17:45:23 GMT

Sorry that the images did not go through! I had hoped to avoid typing

Here it is
The top level

  - bin
              -PipelineSystem.java (this is the main application)
        -gov  (all of the Metamap types were copied here for some reason)
             -testrules.ruta  (copied from above)
        -testrules (these are the types generated by jCasGen, which come
from my Ruta script)

Unfortunately I am rather new to all of this, so I am not totally following
some of your answers. I put my comments and questions inline. The
description of integrating Ruta into Java in the tutorial doesn't say much
about project layout in Eclipse so I had to do lots of internet searches
and copy what I found, so I am sure I am not doing things in the best way.

On Thu, Jan 14, 2016 at 11:19 AM, Peter Kl├╝gl <peter.kluegl@averbis.com>

> Hi,
> just a few short first comments... more tomorrow...
> - Unfortunately, the images did not make it (due to the mailing list
> settings?). You can send me the mail directly if you want.
> - I really prefer now to develop ruta script in maven built projects. Is
> maven an option for you?

I don't know Maven very well and really did not want to add another layer
of complexity to this already very complex system. How does Maven help?

> - You can limit JCasGen to the current project. Then, only local type
> systems are used to generate the classes and the problem with overriding
> RutaBasic is avoided. However, if you copy the descriptors, that does
> not help.

I copied the descriptors, and then used jCasGen on the descriptor down in
the src folder. How do I limit JCasGen?

> - JCasGen on generated type systems of ruta scripts can be tricky
> (because ruta imports the BasicTypeSystem by default and this one should
> not be generated anew). I rather recommned to define JCas cover class
> type in separate type systems.

Not sure I follow what this means

> - Copying descriptors should be avoided in general

So how do I develop using the workbench but get the results into Java? ALl
of the posts I read stressed that you have to have the typedescriptors and
script under the src folder, but the workbench doesn't want them there. The
only online example I could find that uses both MetaMap and Ruta had a
layout kind of similar to mine.

> - Do you need the descriptors of ruta at all? Did you define new types
> in ruta scripts? The java code does not make use of the ruta descriptors

Yes, I define new types, and eventually there will be lots of types. The
current Ruta script is a test case only. I am not sure what you mean when
you say the java code does not make use of the descriptors. If I don't have
them, I get lots of runtime errors.

> - The way you create the ruta descriptors in the java example does not
> support all ruta functionality, e.g. , new types

I am probably doing it completely wrong :-). I couldn't find many examples.

> - The duplicate import is fixed in the next release
OK, good to know that this is not something I am doing wrong.

> - Is the code open source somewhere, e.g., on github?
No, beause it is test code only right now.

> Best,
> Peter
> Am 14.01.2016 um 16:13 schrieb Bonnie MacKellar:
> > Hi,
> >
> > I just spent the last 4 days stumbling through the documentation,
> > tutorials, posts to this mailing list, and any code examples I could
> > find on the Internet, so I could integrate the Metamap annotator and a
> > RUTA script in Java using UimaFit. I succeeded, and I have something
> > that runs, but I doubt I am organizing things the best way in Eclipse,
> > and in particular, I am noticing some odd things if I try to build and
> > test the script first in the Ruta development environment in Eclipse
> > and then move the script to my Java environment. I suspect my workflow
> > is not the best possible, so I am looking for advice on how to manage
> > this.
> >
> > My project was created as a Ruta project so I could have the
> > development environment support. I then added Uima nature to the
> > project to get the Java development folders. I set up the type
> > descriptors for Metamap, and after much reading, realized I needed a
> > types.txt file in my source folder that tells the system how to find
> > the Metamap type descriptors. I then added the Ruta script to the
> > pipeline in my Java class and then copied the type descriptor for that
> > down to my source folders as well. Finally, I realized I needed java
> > classes for the types, and that pressing a jCasGen button in the
> > ComponentDescriptorEditor was the way to do that. However, there are
> > some anomalies when I do this.
> >
> > So, my project has this structure at the top level
> >
> > Inline image 1
> >
> > and at the src level, this is the structure. Notice that the Ruta
> > script and types have been copied down to this level
> >
> > Inline image 2
> >
> >
> > The code that creates the AnalysisEngineDescriptors and runs the
> > pipeline looks like this (it is in PipelineSystem. java)
> >
> > try {
> > ae =
> >
> AnalysisEngineFactory.createEngine(gov.nih.nlm.nls.metamap.uima.MetaMapAnnotator.class);
> > AnalysisEngineDescription mmEngineDesc =
> >
> AnalysisEngineFactory.createEngineDescription(gov.nih.nlm.nls.metamap.uima.MetaMapAnnotator.class);
> >
> > AnalysisEngine rae =
> > AnalysisEngineFactory.createEngine(RutaEngine.class,
> >            "testrules");
> > AnalysisEngineDescription rutaEngineDesc =
> > AnalysisEngineFactory.createEngineDescription(RutaEngine.class,
> >            "testrules");
> > JCas jCas = ae.newJCas();
> > jCas.setDocumentText("serum albumin greater or equal 2g/dL");
> > SimplePipeline.runPipeline(jCas, mmEngineDesc, rutaEngineDesc);
> > displayResults(jCas);
> > displayRutaResults(jCas);
> >
> > and the types.txt file contains this
> > classpath*:desc/types/MetaMapApiTypeSystem.xml
> > classpath*:desc/types/BasicTypeSystem.xml
> > classpath*:desc/types/InternalTypeSystem.xml
> > classpath*:desc/types/testrulesTypeSystem.xml
> >
> >
> > If I want to use the Ruta Workbench to develop my Ruta script, it
> > appears that I have to regenerate the java type files, such as
> > Relational.java, each time I make a change. Is that correct?
> > And when I do this, I notice that it completely regenerates the
> > org.apache.uima.ruta.type hierarchy, which leads to an odd runtime
> > error  (NoSuchMethodException, caused by trying to call
> > setLowMemoryProfile). I read a chain on this list about this error
> > which recommended to delete the regenerated uima type hierachy. This
> > worked, but it seems I have to go through these steps every time I
> > regenerate the Ruta types, which is a pain.
> >
> > Also, I notice that the metamap type hierarchy is also regenerated
> > inside my project. I theorize it is because of the import in my Ruta
> > type descriptor
> > TYPESYSTEM BasicTypeSystem;
> > TYPESYSTEM BasicMetaMapTypeSystem;
> > TYPESYSTEM MetaMapApiTypeSystem;
> > DECLARE Relational,UMLSConcept;
> > Candidate{ -> MARK(UMLSConcept)};
> >
> > is this not the right way to make my script aware of the Metamap types?
> >
> > I also notice that in the type descriptor, this import is generated twice
> > <imports>
> >         <import location="BasicTypeSystem.xml"/>
> >         <import location="BasicTypeSystem.xml"/>
> >     </imports>
> >
> > In general, is it a good or bad idea to develop the Ruta script in the
> > workbench and then copy its pieces into the Java source folder? It
> > seems like a very convoluted process.
> >
> > Thanks for your help
> >
> > Bonnie MacKellar

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message