uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Armando Stellato" <stell...@info.uniroma2.it>
Subject R: annotator based on regular expressions over (previous) annotations: state-of-work in UIMA?
Date Sat, 17 Jan 2009 00:16:33 GMT
Hi Igor,

thanks for the pointer. I've done a brief run under your LREC paper:

http://domino.research.ibm.com/comm/research_projects.nsf/pages/medicalinformatics.pubs.html/$FILE/CFE_sominsky-A4.pdf

and a presentation I found on the Web:

http://watchtower.coling.uni-jena.de/~coling/uimaws_lrec2008/slides/sominsky_20080531_talk_CFE.pdf

At a first glance, it seemed something quite different from what I needed. FESL is a (I hope
not to abuse the term :-) ) trasformator from UIMA features. The target may be new UIMA features
or other kind of data (as for the title of the paper and the example of figure 3, which suggests
its use in Machine Learning, by extracting useful info from the existing annotations, which
can feed a learner). However, I tried to understand it better, because it could anyway have
the power to do what I was looking for, which is to apply regular expressions over the content
of a document, with elements of the expressions being not only represented by strings, digits
etc.. but also by Annotation types. Like (with a very simple syntax) telling that:
.* {<PersonTitle> <Name>}
will extract a new Annotation called Person when matching the (previously annotated with PersonTitle
and Name annotations) string: "Mr John Doe"
Lastly, I think I found the problem: in the paper you mention Reg Exps as one of the 5 filters
which can be applied to evaluate values (upper right part of page 3 of the paper), but the
overall search mechanism (points from a) to f) upper LEFT part of page 3) is not based on
regular expressions nor, I think, has their power (though I will delve into the details of
point f) with further reading).

On the basis of what I got from the reading, I think it is not what I need, though it could
surely be included as part of it. For example (again simple syntax):

.* {<Person>} "salary" <Currency>:normalizedvalue > 300000€

To extract instances of RichPerson

If I missed some crucial aspect, please let me know,

Thanks in advance,

Armando Stellato


> -----Messaggio originale-----
> Da: Igor Sominsky [mailto:sominsky@gmail.com]
> Inviato: venerdì 16 gennaio 2009 22.59
> A: uima-user@incubator.apache.org
> Oggetto: Re: annotator based on regular expressions over (previous)
> annotations: state-of-work in UIMA?
> 
> Armando,
> 
> In posted version of CFE you can alter the value of an extracted feature by
> applying a Java regular expression. The code that is currently under
> development would allow to combine several values by using Java regular
> expressions or math expressions. The grammar of math expressions include
> capability for using java functions and constants (through reflection)
> 
> I hope that answers your question. Please let me know if you need more
> information
> 
> Thank
> Igor
> 
> 
> ----- Original Message -----
> From: "Armando Stellato" <stellato@info.uniroma2.it>
> To: "UIMA" <uima-user@incubator.apache.org>
> Sent: Friday, January 16, 2009 1:19 PM
> Subject: annotator based on regular expressions over (previous) annotations:
> state-of-work in UIMA?
> 
> 
> > Hi all,
> >
> >
> >
> > From a few posts, like the one at the following link:
> >
> >
> >
> > http://osdir.com/ml/apache.uima.general/2008-05/msg00070.html
> >
> >
> >
> > it seems that there is some interest in seeing such kind of processor in
> > the
> > UIMA array of available components.
> >
> >
> >
> > Since we're considering working on developing a new one, but would prefer
> > not to reinvent the wheel J, I'm asking if there is already someone doing
> > the same and, in case, get pointers to their work, know if it is
> > available,
> > if it's still in work-in-progress etc.
> >
> >
> >
> > Best regards,
> >
> >
> >
> > Armando Stellato
> >
> >
> >
> > --------------------------------------------------
> >
> >
> >
> > Ing. Armando Stellato, PhD
> >
> > AI Research Group,
> >
> > Dept. of Computer Science, Systems and Production
> >
> > University of Roma, Tor Vergata
> >
> > Via del Politecnico 1 00133 ROMA (ITALY)
> >
> > tel: +39 06 7259 7330 (office, room A1-14);
> >
> >     +39 06 7259 7332 (lab)
> >
> > fax: +39 06 7259 7460
> >
> > e_mail: stellato@info.uniroma2.it
> >
> > yahoo: stellato75
> >
> > jabber(gtalk): stellato75@gmail.com <mailto:starred75@gmail.com>
> >
> > skype: odnamar
> >
> >
> >
> > --------------------------------------------------
> >
> >
> >
> >


Mime
View raw message