Return-Path: Delivered-To: apmail-incubator-uima-user-archive@locus.apache.org Received: (qmail 87517 invoked from network); 17 Jan 2009 00:17:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Jan 2009 00:17:44 -0000 Received: (qmail 32071 invoked by uid 500); 17 Jan 2009 00:17:43 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 32035 invoked by uid 500); 17 Jan 2009 00:17:43 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 32023 invoked by uid 99); 17 Jan 2009 00:17:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Jan 2009 16:17:43 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [160.80.6.22] (HELO smtp.uniroma2.it) (160.80.6.22) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 Jan 2009 00:17:31 +0000 Received: from ASTERIOS (asterios.info.uniroma2.it [160.80.84.63]) by smtp.uniroma2.it (8.13.6/8.13.6) with ESMTP id n0H0GmiC023975 for ; Sat, 17 Jan 2009 01:16:49 +0100 From: "Armando Stellato" To: References: <03ea01c97807$09182010$1b486030$@uniroma2.it> <000a01c97825$b258b1d0$6401a8c0@watson.ibm.com> In-Reply-To: <000a01c97825$b258b1d0$6401a8c0@watson.ibm.com> Subject: R: annotator based on regular expressions over (previous) annotations: state-of-work in UIMA? Date: Sat, 17 Jan 2009 01:16:33 +0100 Message-ID: <042101c97838$de355a70$9aa00f50$@uniroma2.it> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Acl4JdBDhVwwH18WSO+nQqwUs70BfwABCnLg Content-Language: it X-MailScanner-Information: Please contact the ISP for more information X-MailScanner: Found to be clean X-MailScanner-From: stellato@info.uniroma2.it X-Virus-Checked: Checked by ClamAV on apache.org Hi Igor, thanks for the pointer. I've done a brief run under your LREC paper: http://domino.research.ibm.com/comm/research_projects.nsf/pages/medicalin= formatics.pubs.html/$FILE/CFE_sominsky-A4.pdf and a presentation I found on the Web: http://watchtower.coling.uni-jena.de/~coling/uimaws_lrec2008/slides/somin= sky_20080531_talk_CFE.pdf At a first glance, it seemed something quite different from what I = needed. FESL is a (I hope not to abuse the term :-) ) trasformator from = UIMA features. The target may be new UIMA features or other kind of data = (as for the title of the paper and the example of figure 3, which = suggests its use in Machine Learning, by extracting useful info from the = existing annotations, which can feed a learner). However, I tried to = understand it better, because it could anyway have the power to do what = I was looking for, which is to apply regular expressions over the = content of a document, with elements of the expressions being not only = represented by strings, digits etc.. but also by Annotation types. Like = (with a very simple syntax) telling that: .* { } will extract a new Annotation called Person when matching the = (previously annotated with PersonTitle and Name annotations) string: "Mr = John Doe" Lastly, I think I found the problem: in the paper you mention Reg Exps = as one of the 5 filters which can be applied to evaluate values (upper = right part of page 3 of the paper), but the overall search mechanism = (points from a) to f) upper LEFT part of page 3) is not based on regular = expressions nor, I think, has their power (though I will delve into the = details of point f) with further reading). On the basis of what I got from the reading, I think it is not what I = need, though it could surely be included as part of it. For example = (again simple syntax): .* {} "salary" :normalizedvalue > 300000=E2=82=AC To extract instances of RichPerson If I missed some crucial aspect, please let me know, Thanks in advance, Armando Stellato > -----Messaggio originale----- > Da: Igor Sominsky [mailto:sominsky@gmail.com] > Inviato: venerd=C3=AC 16 gennaio 2009 22.59 > A: uima-user@incubator.apache.org > Oggetto: Re: annotator based on regular expressions over (previous) > annotations: state-of-work in UIMA? >=20 > Armando, >=20 > In posted version of CFE you can alter the value of an extracted = feature by > applying a Java regular expression. The code that is currently under > development would allow to combine several values by using Java = regular > expressions or math expressions. The grammar of math expressions = include > capability for using java functions and constants (through reflection) >=20 > I hope that answers your question. Please let me know if you need more > information >=20 > Thank > Igor >=20 >=20 > ----- Original Message ----- > From: "Armando Stellato" > To: "UIMA" > Sent: Friday, January 16, 2009 1:19 PM > Subject: annotator based on regular expressions over (previous) = annotations: > state-of-work in UIMA? >=20 >=20 > > Hi all, > > > > > > > > From a few posts, like the one at the following link: > > > > > > > > http://osdir.com/ml/apache.uima.general/2008-05/msg00070.html > > > > > > > > it seems that there is some interest in seeing such kind of = processor in > > the > > UIMA array of available components. > > > > > > > > Since we're considering working on developing a new one, but would = prefer > > not to reinvent the wheel J, I'm asking if there is already someone = doing > > the same and, in case, get pointers to their work, know if it is > > available, > > if it's still in work-in-progress etc. > > > > > > > > Best regards, > > > > > > > > Armando Stellato > > > > > > > > -------------------------------------------------- > > > > > > > > Ing. Armando Stellato, PhD > > > > AI Research Group, > > > > Dept. of Computer Science, Systems and Production > > > > University of Roma, Tor Vergata > > > > Via del Politecnico 1 00133 ROMA (ITALY) > > > > tel: +39 06 7259 7330 (office, room A1-14); > > > > +39 06 7259 7332 (lab) > > > > fax: +39 06 7259 7460 > > > > e_mail: stellato@info.uniroma2.it > > > > yahoo: stellato75 > > > > jabber(gtalk): stellato75@gmail.com > > > > skype: odnamar > > > > > > > > -------------------------------------------------- > > > > > > > >