uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Holdorf <matthias.hold...@gmail.com>
Subject Unobtrusive integration of Ruta-Scripts in Java Application
Date Fri, 19 Aug 2016 15:59:16 GMT
Hello,

Sorry! My last mail had poor formatting.

I develop a rather large application with UIMA and RUTA in Java.

The goal is to detect and mark readability anomalies in a .docx document
with comments. For the rule part I use RUTA. While I already dealt with the
shortcomings of .docx APIs and parsing text, I'm about to fine-tune my
application.

My objective: Have a user add a ruta-script to the directory and the rule
is applied without further configuration. In the appendix you see my
directory layout for the relevant part.

So far, I see two ways of achieving my goal. However, both approaches do
not fit perfectly

1) As you see in the layout: Having a types.txt that specifies the path to
my type systems, the type system description does not need to be specified
in the application. I can create the JCas object the following way (without
a type system description):

JCas jCas = analysisEngine.newJCas();
>

However, I have to manually change the MainTypeSystem.xml, which contains
my self declared types for this approach to work.

2) While looking at the RUTA documentation and the internals of the RUTA
Workbench for eclipse, I found that you can run ruta-scripts

    public static TypeSystemDescription getRutaRuleTypeSystem() throws
IOException, RecognitionException,
            InvalidXMLException, ResourceInitializationException,
URISyntaxException {
        RutaDescriptorFactory factory = new RutaDescriptorFactory();
        RutaDescriptorInformation rd =
factory.parseDescriptorInformation("DECLARE NOUNS; (CW){->
MARK(PR_TEST)};");
        TypeSystemDescription typeSystemDescription =
factory.createTypeSystemDescription("test.xml", rd,
                new RutaBuildOptions(), null);
        return typeSystemDescription;
    }}

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message