uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Holdorf" <matthias.hold...@gmail.com>
Subject Unobtrusive integration of Ruta-Scripts in Java Application
Date Fri, 19 Aug 2016 14:36:05 GMT

I develop a rather large application with UIMA and RUTA in Java.

The goal is to detect and mark readability anomalies in a .docx document
with comments. For the rule part I use RUTA. While I already dealt with the
shortcomings of .docx APIs and parsing text, I'm about to fine-tune my

My objective: Have a user add a ruta-script to the directory and the rule is
applied without further configuration. The relevant directory layout looks
like the following:

src/main/resources .
META-INF/org.apache.uima.fit/types.txt (path to type systems)
type-system/MainTypeSystem.xml (where I define my own rules)

So far, I see two ways of achieving my goal. However, both approaches do not
fit perfectly

1) As you see in the layout: Having a types.txt that specifies the path to
my type systems, the type system description does not need to be specified
in the application. I can create the JCas object the following way (without
a type system description):

JCas jCas = analysisEngine.newJCas();

However, I have to manually change the MainTypeSystem.xml, which contains my
self-declared types for this approach to work.

I like to add the new declared types of the ruta-script to the existing

2) While looking at the RUTA documentation and the internals of the RUTA
Workbench for eclipse, I found that you can run ruta-scripts the following

        public static TypeSystemDescription getRutaRuleTypeSystem() throws
IOException, RecognitionException,
ResourceInitializationException, URISyntaxException {
                RutaDescriptorFactory factory = new RutaDescriptorFactory();
                RutaDescriptorInformation rd =
factory.parseDescriptorInformation("DECLARE NOUNS; (CW){->
                TypeSystemDescription typeSystemDescription =
factory.createTypeSystemDescription("test.xml", rd,
                                new RutaBuildOptions(), null);
                return typeSystemDescription;

However, this approach seems tedious to me.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message