uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kl├╝gl <pklu...@uni-wuerzburg.de>
Subject Re: TextMarker language workthrough for text simplification example?
Date Mon, 19 Nov 2012 15:04:58 GMT
I can see only one attached file: TextSimplifier.xml

Can you send me the input file, the rules and the type systems.

Peter

On 19.11.2012 13:45, Monaghan, Fergal wrote:
>
> I've attached here the descriptor ("TextSimplifier.xml": configuration 
> for TextMarkerEngine), the test input data ("random01.txt.xmi": 
> Cleartk[OpenNLP] annotated), the rules file ("rules.tm": with 1 rule, 
> my first partial attempt at the text simplification process) and the 
> current output ("1.xmi": one additional tag has been created by the 
> rule), if this helps,
>
> Thanks again,
>
> Fergal.
>
> *From:*fergal.monaghan@sap.com
> *Sent:* 19 November 2012 09:56
> *To:* 'user@uima.apache.org'
> *Subject:* TextMarker language workthrough for text simplification 
> example?
>
> Hi all (and especially the good folks working on TextMarker in the 
> sandbox),
>
> 1. I am interested in implementing the type of text simplification 
> rules set out in this paper [1].
>
> 2. I would prefer to use TextMarker (and its language) natively in 
> UIMA than use the UIMA<->GATE integration and JAPE rules.
>
> 3. I have cloned TextMarker from the repo and have configured an 
> analysis engine descriptor to run TextMarkerEngine using custom rules.
>
> 4. I have switched off the TextMarkerEngine seed annotations as I am 
> testing on pre-processed XMI files that have been pre-annotated with 
> the Cleartk type systems (up to and including TreebankNodes... OpenNLP 
> used under the hood if that's of interest).
>
> 5. Things are building and unit tests running fine on simple rules. 
> Yay! Good work guys :)
>
> Now I am focussing on customising the rules for the text 
> simplification application. I have been studying the TextMarker 
> language documentation here [2] as well as TextMarker's unit tests in 
> the sandbox to get things working so far, but am now asking for your 
> help to complete one of the example rules I'd like to implement. This 
> is the example from [1]:
>
> Input (original):
>
> "The jury also commented on the Fulton court, which has been under 
> fire for its practices in the appointment of appraisers, guardians and 
> administrators."
>
> Output (simplified):
>
> "The jury also commented on the Fulton court." "The Fulton court has 
> been under fire for its practices in the appointment of appraisers, 
> guardians and administrators."
>
> Rule I want to implement in the TextMarker language:
>
> V W:NP_ant, Rel Clause(X:Rel Pr Y), Z. ->            V W Z. W Y.
>
> which can be interpreted as "If a sentence consists of any text V 
> followed by the antecedent noun phrase W, a relative clause 
> (consisting of a relative pronoun X and a sequence of words Y) 
> enclosed in commas and a sequence of words Z, then the embedded clause 
> can be made into a new sentence with W as the subject NP".
>
> So far I have gotten to this in the TextMarker language (please see 
> below the contents of my rules.tm file that I'm running through 
> TextMarker). Please note this itself is not an attempt at the final 
> complete rule, but some intermediate attempt that is the furthest I've 
> been able to get on my own which still passes unit tests:
>
> ===============================================
>
> PACKAGE org.cleartk.syntax.constituent.type;
>
> (TreebankNode{FEATURE("nodeType","NP")} 
> TerminalTreebankNode{FEATURE("nodeType",",")} 
> TerminalTreebankNode{FEATURE("nodeType","WDT")} 
> TreebankNode{FEATURE("nodeType","S")}){->MARK(com.sap.research.bd.ta.AdjectivalOrRelativeClause)};
>
> ===============================================
>
> Can someone complete this rule to get me closer to the example above? 
> I lack understanding of the TextMarker language, but I feel that if I 
> had an example of this slightly more complex rule than what is present 
> in the unit tests/documentation, that I would be able to work it out 
> for the rest of the rules I want to implement.
>
> Thanks very much for reading, and for any help you can provide,
>
> *Fergal Monaghan*
> B.E., Ph.D.   |   Research Specialist   |   SAP Research
> *SAP (UK) Limited*   |   The Concourse   |   Queen's Road   |   
> Belfast BT3 9DT
>
> T: +44 (0)28 9078-5705   |   M:   +44 (0)79 2076-6281   | F:   +44 
> (0)28 9078-5777
>
> mailto:fergal.monaghan@sap.com | www.sap.com/research 
> <http://www.sap.com/research>__
>
> [1] http://homepages.abdn.ac.uk/advaith/pages/LEC02.pdf 
> <http://homepages.abdn.ac.uk/advaith/pages/LEC02.pdf>
>
> [2] http://tmwiki.informatik.uni-wuerzburg.de/Wiki.jsp?page=Introduction
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message