uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: best practice for building RUTA scripts in Eclipse when they are to be run in Java?
Date Mon, 18 Jan 2016 15:09:58 GMT
Hi,

comments inline...

Some background information at first:
The UIMA Ruta Workbench and its project layout was initially designed to
support the user in developing rules. Thus, the project structure is "as
simple as possible" and tailored for this task. However, the deployment
and integration of the rules in actual applications was neglected since
there is no direct support for generic build tools, e.g, for packaging
the rules and descriptiors correctly in a jar. This leads often to
situations where users need to perform manual steps which is a no go for
larger projects.

In my opinion, maven has more advantages than disadvantages and is a
must for development of larger java projects. Thus, I created a maven
plugin that does that stuff, the Workbench was responsible for: it
creates the descriptors and compiles the wordlists. I also adapted the
Workbench to work with normal maven based projects. So now, at least
when I write serious ruta rules I do not use the old project layout in
the Workbench, but a common maven project with additional plugins. There
you can specify the packaging of your ruta based application or just see
the ruta files as some resources in a java file, and you can add
automatic unit tests. It's much more flexible and powerful and I think
it's worth the trouble. Sometimes I also use the old project layout, but
just for some quick testing.

I cannot provide a general introduction to maven. Best to start by using
an existing project and learn by modifying and adapting it to your use case.

Am 14.01.2016 um 18:45 schrieb Bonnie MacKellar:
> Hi,
>
> Sorry that the images did not go through! I had hoped to avoid typing
>
> Here it is
> The top level
>
> ECClassifier
>   - bin
>   --desc
>   -descriptor
>       -utils
>       -BasicEngine.xml
>       -BasicTypeSystem.xml
>       -InternalTypeSystem.xml
>       -testrulesEngine.xml
>       -testrulesTypeSystem.xml
>    -input
>    -metadata
>    -output
>    -script
>          -testrules.ruta
>     -src
>        -desc
>           -types
>               -BasciMetaMapTypeSystem.xml
>               -BasicTypeSystem.xml
>               -InternalTypeSystem.xml
>               -MetaMapApiAE.xml
>               -MetaMapApiTypeSystem.xml
>               -testrulesTypeSystem.xml
>         -ec
>            -metamap
>               -PipelineSystem.java (this is the main application)
>         -gov  (all of the Metamap types were copied here for some reason)
>         -META-INF
>             -org.apache.uima.fit
>                  -types.txt
>        -ruta
>          -ec
>              -testrules.ruta  (copied from above)
>         -testrules (these are the types generated by jCasGen, which come
> from my Ruta script)
>             -Relational_Type.java
>             -Relational.java
>             -UMLSConcept_type.java
>             -UMLSConcept.java
>
> Unfortunately I am rather new to all of this, so I am not totally following
> some of your answers. I put my comments and questions inline. The
> description of integrating Ruta into Java in the tutorial doesn't say much
> about project layout in Eclipse so I had to do lots of internet searches
> and copy what I found, so I am sure I am not doing things in the best way.

What did you miss? I should link/add more examples. I will add more
documentation when I find the time.


> On Thu, Jan 14, 2016 at 11:19 AM, Peter Klügl <peter.kluegl@averbis.com>
> wrote:
>
>> Hi,
>>
>> just a few short first comments... more tomorrow...
>>
>> - Unfortunately, the images did not make it (due to the mailing list
>> settings?). You can send me the mail directly if you want.
>> - I really prefer now to develop ruta script in maven built projects. Is
>> maven an option for you?
>>
> I don't know Maven very well and really did not want to add another layer
> of complexity to this already very complex system. How does Maven help?

It automates the manual steps like generating the JCas classes and
copying the descriptors. There is a maven plugin for JCasGen and one for
ruta. Thus the build process results in exactly the jar you want. It's
really terrible in the beginning, but I would not want to miss it now.

>
>> - You can limit JCasGen to the current project. Then, only local type
>> systems are used to generate the classes and the problem with overriding
>> RutaBasic is avoided. However, if you copy the descriptors, that does
>> not help.
>>
> I copied the descriptors, and then used jCasGen on the descriptor down in
> the src folder. How do I limit JCasGen?
>

There is a checkbox right below the JCasGen button in hte component
descriptor editor. The maven plugin also support this option. If you
want to generate JCas classes from ruta scripts, then you have to take
care that the types of BasicTypeSystem are not generated as well. The
only way I see right now to avoid that is this option but it only works
if the type system is located in a different project. Thus, no copying
and no old project layout (or some hacks). With the maven plugin, the
descriptors can be loaded from, e.g., the classpath of the project, and
do not need to be located within the project.

>> - JCasGen on generated type systems of ruta scripts can be tricky
>> (because ruta imports the BasicTypeSystem by default and this one should
>> not be generated anew). I rather recommned to define JCas cover class
>> type in separate type systems.
>>
> Not sure I follow what this means
>

I normally have additional typesystem descriptors in my ruta projects
which contain the types that are used to generate the java classes. The
types defined in the ruta scripts are not used for generating the JCas
classes but only for intermediate annotations. This is probably only a
personal preference.


>> - Copying descriptors should be avoided in general
>>
> So how do I develop using the workbench but get the results into Java? ALl
> of the posts I read stressed that you have to have the typedescriptors and
> script under the src folder, but the workbench doesn't want them there. The
> only online example I could find that uses both MetaMap and Ruta had a
> layout kind of similar to mine.

The maven plugin is able to generate the buildpath that is required by
the Workbench to work correctly. It contains the ruta specific source
paths. So the Workbench can also be used in normal java projects if they
are built with maven. Then, everything is automatically usable in Java.

>
>> - Do you need the descriptors of ruta at all? Did you define new types
>> in ruta scripts? The java code does not make use of the ruta descriptors
>>
> Yes, I define new types, and eventually there will be lots of types. The
> current Ruta script is a test case only. I am not sure what you mean when
> you say the java code does not make use of the descriptors. If I don't have
> them, I get lots of runtime errors.
>

Hmmm... missed the types.txt which is responsible that the types are
available.

Your code lines
AnalysisEngine rae =
AnalysisEngineFactory.createEngine(RutaEngine.class,
RutaEngine.PARAM_MAIN_SCRIPT,
           "testrules");
AnalysisEngineDescription rutaEngineDesc =
AnalysisEngineFactory.createEngineDescription(RutaEngine.class,
RutaEngine.PARAM_MAIN_SCRIPT, "testrules");

use the RutaEngine implementation in order to create the analysis engine
description. This description do not refer the types defined in the ruta
scripts. This works for simple script project but will fail for more
complicated ones, e.g., when you have several scripts importing each
other. There is a helper class in ruta for generating descriptors from
scripts including types:
org.apache.uima.ruta.descriptor.RutaDescriptorFactory


>> - The way you create the ruta descriptors in the java example does not
>> support all ruta functionality, e.g. , new types
>>
> I am probably doing it completely wrong :-). I couldn't find many examples.

No, a lot of functionality was added but not as much documentation. Let
me know if you want to have more information about the
RutaDescriptorFactory. The maven plugin uses this class the generate the
descriptors.

I hope this helps a bit. Just ask if you have more questions or if
something is not clear.

Best,

Peter


>
>> - The duplicate import is fixed in the next release
>>
> OK, good to know that this is not something I am doing wrong.
>
>
>> - Is the code open source somewhere, e.g., on github?
>>
> No, beause it is test code only right now.
>
>> Best,
>>
>> Peter
>>
>> Am 14.01.2016 um 16:13 schrieb Bonnie MacKellar:
>>> Hi,
>>>
>>> I just spent the last 4 days stumbling through the documentation,
>>> tutorials, posts to this mailing list, and any code examples I could
>>> find on the Internet, so I could integrate the Metamap annotator and a
>>> RUTA script in Java using UimaFit. I succeeded, and I have something
>>> that runs, but I doubt I am organizing things the best way in Eclipse,
>>> and in particular, I am noticing some odd things if I try to build and
>>> test the script first in the Ruta development environment in Eclipse
>>> and then move the script to my Java environment. I suspect my workflow
>>> is not the best possible, so I am looking for advice on how to manage
>>> this.
>>>
>>> My project was created as a Ruta project so I could have the
>>> development environment support. I then added Uima nature to the
>>> project to get the Java development folders. I set up the type
>>> descriptors for Metamap, and after much reading, realized I needed a
>>> types.txt file in my source folder that tells the system how to find
>>> the Metamap type descriptors. I then added the Ruta script to the
>>> pipeline in my Java class and then copied the type descriptor for that
>>> down to my source folders as well. Finally, I realized I needed java
>>> classes for the types, and that pressing a jCasGen button in the
>>> ComponentDescriptorEditor was the way to do that. However, there are
>>> some anomalies when I do this.
>>>
>>> So, my project has this structure at the top level
>>>
>>> Inline image 1
>>>
>>> and at the src level, this is the structure. Notice that the Ruta
>>> script and types have been copied down to this level
>>>
>>> Inline image 2
>>>
>>>
>>> The code that creates the AnalysisEngineDescriptors and runs the
>>> pipeline looks like this (it is in PipelineSystem. java)
>>>
>>> try {
>>> ae =
>>>
>> AnalysisEngineFactory.createEngine(gov.nih.nlm.nls.metamap.uima.MetaMapAnnotator.class);
>>> AnalysisEngineDescription mmEngineDesc =
>>>
>> AnalysisEngineFactory.createEngineDescription(gov.nih.nlm.nls.metamap.uima.MetaMapAnnotator.class);
>>> AnalysisEngine rae =
>>> AnalysisEngineFactory.createEngine(RutaEngine.class,
>>> RutaEngine.PARAM_MAIN_SCRIPT,
>>>            "testrules");
>>> AnalysisEngineDescription rutaEngineDesc =
>>> AnalysisEngineFactory.createEngineDescription(RutaEngine.class,
>>> RutaEngine.PARAM_MAIN_SCRIPT,
>>>            "testrules");
>>> JCas jCas = ae.newJCas();
>>> jCas.setDocumentText("serum albumin greater or equal 2g/dL");
>>> SimplePipeline.runPipeline(jCas, mmEngineDesc, rutaEngineDesc);
>>> displayResults(jCas);
>>> displayRutaResults(jCas);
>>>
>>> and the types.txt file contains this
>>> classpath*:desc/types/MetaMapApiTypeSystem.xml
>>> classpath*:desc/types/BasicTypeSystem.xml
>>> classpath*:desc/types/InternalTypeSystem.xml
>>> classpath*:desc/types/testrulesTypeSystem.xml
>>>
>>>
>>> If I want to use the Ruta Workbench to develop my Ruta script, it
>>> appears that I have to regenerate the java type files, such as
>>> Relational.java, each time I make a change. Is that correct?
>>> And when I do this, I notice that it completely regenerates the
>>> org.apache.uima.ruta.type hierarchy, which leads to an odd runtime
>>> error  (NoSuchMethodException, caused by trying to call
>>> setLowMemoryProfile). I read a chain on this list about this error
>>> which recommended to delete the regenerated uima type hierachy. This
>>> worked, but it seems I have to go through these steps every time I
>>> regenerate the Ruta types, which is a pain.
>>>
>>> Also, I notice that the metamap type hierarchy is also regenerated
>>> inside my project. I theorize it is because of the import in my Ruta
>>> type descriptor
>>> TYPESYSTEM BasicTypeSystem;
>>> TYPESYSTEM BasicMetaMapTypeSystem;
>>> TYPESYSTEM MetaMapApiTypeSystem;
>>> DECLARE Relational,UMLSConcept;
>>> Candidate{ -> MARK(UMLSConcept)};
>>>
>>> is this not the right way to make my script aware of the Metamap types?
>>>
>>> I also notice that in the type descriptor, this import is generated twice
>>> <imports>
>>>         <import location="BasicTypeSystem.xml"/>
>>>         <import location="BasicTypeSystem.xml"/>
>>>     </imports>
>>>
>>> In general, is it a good or bad idea to develop the Ruta script in the
>>> workbench and then copy its pieces into the Java source folder? It
>>> seems like a very convoluted process.
>>>
>>> Thanks for your help
>>>
>>> Bonnie MacKellar
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message