uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Monaghan, Fergal" <fergal.monag...@sap.com>
Subject RE: TextMarker language workthrough for text simplification example?
Date Mon, 19 Nov 2012 12:45:49 GMT
I've attached here the descriptor ("TextSimplifier.xml": configuration for TextMarkerEngine),
the test input data ("random01.txt.xmi": Cleartk[OpenNLP] annotated), the rules file ("rules.tm":
with 1 rule, my first partial attempt at the text simplification process) and the current
output ("1.xmi": one additional tag has been created by the rule), if this helps,

Thanks again,

From: fergal.monaghan@sap.com
Sent: 19 November 2012 09:56
To: 'user@uima.apache.org'
Subject: TextMarker language workthrough for text simplification example?

Hi all (and especially the good folks working on TextMarker in the sandbox),

1. I am interested in implementing the type of text simplification rules set out in this paper
2. I would prefer to use TextMarker (and its language) natively in UIMA than use the UIMA<->GATE
integration and JAPE rules.
3. I have cloned TextMarker from the repo and have configured an analysis engine descriptor
to run TextMarkerEngine using custom rules.
4. I have switched off the TextMarkerEngine seed annotations as I am testing on pre-processed
XMI files that have been pre-annotated with the Cleartk type systems (up to and including
TreebankNodes... OpenNLP used under the hood if that's of interest).
5. Things are building and unit tests running fine on simple rules. Yay! Good work guys :)

Now I am focussing on customising the rules for the text simplification application. I have
been studying the TextMarker language documentation here [2] as well as TextMarker's unit
tests in the sandbox to get things working so far, but am now asking for your help to complete
one of the example rules I'd like to implement. This is the example from [1]:

Input (original):
"The jury also commented on the Fulton court, which has been under fire for its practices
in the appointment of appraisers, guardians and administrators."
Output (simplified):
"The jury also commented on the Fulton court." "The Fulton court has been under fire for its
practices in the appointment of appraisers, guardians and administrators."

Rule I want to implement in the TextMarker language:
V W:NP_ant, Rel Clause(X:Rel Pr Y), Z.    ->            V W Z. W Y.
which can be interpreted as "If a sentence consists of any text V followed by the antecedent
noun phrase W, a relative clause (consisting of a relative pronoun X and a sequence of words
Y) enclosed in commas and a sequence of words Z, then the embedded clause can be made into
a new sentence with W as the subject NP".

So far I have gotten to this in the TextMarker language (please see below the contents of
my rules.tm file that I'm running through TextMarker). Please note this itself is not an attempt
at the final complete rule, but some intermediate attempt that is the furthest I've been able
to get on my own which still passes unit tests:

PACKAGE org.cleartk.syntax.constituent.type;

(TreebankNode{FEATURE("nodeType","NP")} TerminalTreebankNode{FEATURE("nodeType",",")} TerminalTreebankNode{FEATURE("nodeType","WDT")}

Can someone complete this rule to get me closer to the example above? I lack understanding
of the TextMarker language, but I feel that if I had an example of this slightly more complex
rule than what is present in the unit tests/documentation, that I would be able to work it
out for the rest of the rules I want to implement.

Thanks very much for reading, and for any help you can provide,

Fergal Monaghan
B.E., Ph.D.   |   Research Specialist   |   SAP Research
SAP (UK) Limited   |   The Concourse   |   Queen's Road   |   Belfast BT3 9DT
T:   +44 (0)28 9078-5705   |   M:   +44 (0)79 2076-6281   |   F:   +44 (0)28 9078-5777
mailto:fergal.monaghan@sap.com   |   www.sap.com/research<http://www.sap.com/research>

[1] http://homepages.abdn.ac.uk/advaith/pages/LEC02.pdf
[2] http://tmwiki.informatik.uni-wuerzburg.de/Wiki.jsp?page=Introduction

View raw message