Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7AC34D532 for ; Mon, 19 Nov 2012 17:11:07 +0000 (UTC) Received: (qmail 69787 invoked by uid 500); 19 Nov 2012 17:11:07 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 69726 invoked by uid 500); 19 Nov 2012 17:11:06 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 69709 invoked by uid 99); 19 Nov 2012 17:11:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Nov 2012 17:11:06 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [132.187.3.35] (HELO mailrelay.rz.uni-wuerzburg.de) (132.187.3.35) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Nov 2012 17:10:58 +0000 Received: from virusscan-slb.rz.uni-wuerzburg.de (localhost [127.0.0.1]) by mailrelay-slb.rz.uni-wuerzburg.de (Postfix) with ESMTP id 0B6CA7AE5A for ; Mon, 19 Nov 2012 18:10:38 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by virusscan-slb.rz.uni-wuerzburg.de (Postfix) with ESMTP id 091D97AE14 for ; Mon, 19 Nov 2012 18:10:38 +0100 (CET) X-Virus-Scanned: amavisd-new at uni-wuerzburg.de Received: from mailmaster.uni-wuerzburg.de ([10.101.19.1]) by localhost (vmail001.slb.uni-wuerzburg.de [10.101.19.141]) (amavisd-new, port 10225) with ESMTP id 2bOi-4gdahnp for ; Mon, 19 Nov 2012 18:10:37 +0100 (CET) Received: from [132.187.15.93] (win6093.informatik.uni-wuerzburg.de [132.187.15.93]) by mailmaster.uni-wuerzburg.de (Postfix) with ESMTPSA id BAB6D7ABCB for ; Mon, 19 Nov 2012 18:10:37 +0100 (CET) Message-ID: <50AA680D.8050801@uni-wuerzburg.de> Date: Mon, 19 Nov 2012 18:10:37 +0100 From: =?ISO-8859-1?Q?Peter_Kl=FCgl?= User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: user@uima.apache.org Subject: Re: TextMarker language workthrough for text simplification example? References: <6A29AB78FB4DC74D85AB78BD0265D54E0704A6B1@DEWDFEMB17B.global.corp.sap> <6A29AB78FB4DC74D85AB78BD0265D54E0704A801@DEWDFEMB17B.global.corp.sap> <50AA4A9A.7020106@uni-wuerzburg.de> In-Reply-To: <50AA4A9A.7020106@uni-wuerzburg.de> Content-Type: multipart/mixed; boundary="------------060901030806060002070103" X-Virus-Checked: Checked by ClamAV on apache.org --------------060901030806060002070103 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Hi Fergal, I played a bit around and attached the resulting TextMarker project. The first part of the script is only there for creating some annotations. I haven't used ClearTK for a while and was too lazy to update it. The main part with the block looks at sentences with WDTs and creates some annotations. The rules with REPLACE are used to remember the changes and the rule with EXEC(Modifier) creates a new view with the changed document. The changes are located in the view named "modified": "The jury also commented on the Fulton court, which has been under fire for its practices in the appointment of appraisers." ...becomes... "The jury also commented on the Fulton court . the Fulton court has been under fire for its practices in the appointment of appraisers." (I removed some words, because I included no correct identification of relative clauses) "Peter, who just woke up, goes to work." ...becomes... "Peter goes to work. Peter just woke up." It's only a fast and ugly hack, but I hope this helps a bit. If you have any questions, just ask :-) Best, Peter On 19.11.2012 16:04, Peter Kl�gl wrote: > I can see only one attached file: TextSimplifier.xml > > Can you send me the input file, the rules and the type systems. > > Peter > > On 19.11.2012 13:45, Monaghan, Fergal wrote: >> >> I've attached here the descriptor ("TextSimplifier.xml": >> configuration for TextMarkerEngine), the test input data >> ("random01.txt.xmi": Cleartk[OpenNLP] annotated), the rules file >> ("rules.tm": with 1 rule, my first partial attempt at the text >> simplification process) and the current output ("1.xmi": one >> additional tag has been created by the rule), if this helps, >> >> Thanks again, >> >> Fergal. >> >> *From:*fergal.monaghan@sap.com >> *Sent:* 19 November 2012 09:56 >> *To:* 'user@uima.apache.org' >> *Subject:* TextMarker language workthrough for text simplification >> example? >> >> Hi all (and especially the good folks working on TextMarker in the >> sandbox), >> >> 1. I am interested in implementing the type of text simplification >> rules set out in this paper [1]. >> >> 2. I would prefer to use TextMarker (and its language) natively in >> UIMA than use the UIMA<->GATE integration and JAPE rules. >> >> 3. I have cloned TextMarker from the repo and have configured an >> analysis engine descriptor to run TextMarkerEngine using custom rules. >> >> 4. I have switched off the TextMarkerEngine seed annotations as I am >> testing on pre-processed XMI files that have been pre-annotated with >> the Cleartk type systems (up to and including TreebankNodes... >> OpenNLP used under the hood if that's of interest). >> >> 5. Things are building and unit tests running fine on simple rules. >> Yay! Good work guys :) >> >> Now I am focussing on customising the rules for the text >> simplification application. I have been studying the TextMarker >> language documentation here [2] as well as TextMarker's unit tests in >> the sandbox to get things working so far, but am now asking for your >> help to complete one of the example rules I'd like to implement. This >> is the example from [1]: >> >> Input (original): >> >> "The jury also commented on the Fulton court, which has been under >> fire for its practices in the appointment of appraisers, guardians >> and administrators." >> >> Output (simplified): >> >> "The jury also commented on the Fulton court." "The Fulton court has >> been under fire for its practices in the appointment of appraisers, >> guardians and administrators." >> >> Rule I want to implement in the TextMarker language: >> >> V W:NP_ant, Rel Clause(X:Rel Pr Y), Z. -> V W Z. W Y. >> >> which can be interpreted as "If a sentence consists of any text V >> followed by the antecedent noun phrase W, a relative clause >> (consisting of a relative pronoun X and a sequence of words Y) >> enclosed in commas and a sequence of words Z, then the embedded >> clause can be made into a new sentence with W as the subject NP". >> >> So far I have gotten to this in the TextMarker language (please see >> below the contents of my rules.tm file that I'm running through >> TextMarker). Please note this itself is not an attempt at the final >> complete rule, but some intermediate attempt that is the furthest >> I've been able to get on my own which still passes unit tests: >> >> =============================================== >> >> PACKAGE org.cleartk.syntax.constituent.type; >> >> (TreebankNode{FEATURE("nodeType","NP")} >> TerminalTreebankNode{FEATURE("nodeType",",")} >> TerminalTreebankNode{FEATURE("nodeType","WDT")} >> TreebankNode{FEATURE("nodeType","S")}){->MARK(com.sap.research.bd.ta.AdjectivalOrRelativeClause)}; >> >> =============================================== >> >> Can someone complete this rule to get me closer to the example above? >> I lack understanding of the TextMarker language, but I feel that if I >> had an example of this slightly more complex rule than what is >> present in the unit tests/documentation, that I would be able to work >> it out for the rest of the rules I want to implement. >> >> Thanks very much for reading, and for any help you can provide, >> >> *Fergal Monaghan* >> B.E., Ph.D. | Research Specialist | SAP Research >> *SAP (UK) Limited* | The Concourse | Queen's Road | Belfast >> BT3 9DT >> >> T: +44 (0)28 9078-5705 | M: +44 (0)79 2076-6281 | F: +44 >> (0)28 9078-5777 >> >> mailto:fergal.monaghan@sap.com | www.sap.com/research >> __ >> >> [1] http://homepages.abdn.ac.uk/advaith/pages/LEC02.pdf >> >> >> [2] http://tmwiki.informatik.uni-wuerzburg.de/Wiki.jsp?page=Introduction >> > > --------------060901030806060002070103--