uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Holdorf" <matthias.hold...@gmail.com>
Subject Unobtrusive integration of Ruta-Scripts in Java Application
Date Fri, 19 Aug 2016 14:36:05 GMT
Hello,

I develop a rather large application with UIMA and RUTA in Java.

The goal is to detect and mark readability anomalies in a .docx document
with comments. For the rule part I use RUTA. While I already dealt with the
shortcomings of .docx APIs and parsing text, I'm about to fine-tune my
application.

My objective: Have a user add a ruta-script to the directory and the rule is
applied without further configuration. The relevant directory layout looks
like the following:

src/main/resources .
META-INF/org.apache.uima.fit/types.txt (path to type systems)
ruta-script/Main.ruta
ruta-script/Nouns.ruta
type-system/Anomaly.xml
type-system/BasicTypeSystem.xml
type-system/DKProCoreTypes.xml
type-system/InternalTypeSystem.xml
type-system/MainTypeSystem.xml (where I define my own rules)
type-system/ReadabilityScore.xml

So far, I see two ways of achieving my goal. However, both approaches do not
fit perfectly

1) As you see in the layout: Having a types.txt that specifies the path to
my type systems, the type system description does not need to be specified
in the application. I can create the JCas object the following way (without
a type system description):

JCas jCas = analysisEngine.newJCas();

However, I have to manually change the MainTypeSystem.xml, which contains my
self-declared types for this approach to work.

I like to add the new declared types of the ruta-script to the existing
MainTypeSystem.xml.

2) While looking at the RUTA documentation and the internals of the RUTA
Workbench for eclipse, I found that you can run ruta-scripts the following
way:

        public static TypeSystemDescription getRutaRuleTypeSystem() throws
IOException, RecognitionException,
                        InvalidXMLException,
ResourceInitializationException, URISyntaxException {
                RutaDescriptorFactory factory = new RutaDescriptorFactory();
                RutaDescriptorInformation rd =
factory.parseDescriptorInformation("DECLARE NOUNS; (CW){->
MARK(PR_TEST)};");
                TypeSystemDescription typeSystemDescription =
factory.createTypeSystemDescription("test.xml", rd,
                                new RutaBuildOptions(), null);
                return typeSystemDescription;
        }}

However, this approach seems tedious to me.

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message