uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "D.J. McCloskey" <dj_mcclos...@ie.ibm.com>
Subject Re: R: annotator based on regular expressions over (previous) annotations: state-of-work in UIMA?
Date Sat, 17 Jan 2009 11:41:20 GMT
Hi Armando,

Have you had a look at the LanguageWare technology on alphaWorks?. I think
it might be what you are looking for. Take a look at the technology posted
here (http://www.alphaworks.ibm.com/tech/lrw) on IBM's alphaworks site - it
seems really close to what you are looking for.

What is there is an eclipse based workbench for configuring an aggregate
analyzer i.e. rules and dictionaries which then drive a UIMA pipeline
consisting of language identification, lexical analysis with linguistic
normalization, POS Tagging and Finite state transducer based rule annotator
which operates over annotations and features in the CAS.

The UI doesn't expose all the capabilities in the underlying annotators but
I'd be really interested to have your opinions about it.
Feel free to contact us through the mail address in the FAQ for specifics.

Regards,
-DJ
-------------------
D.J McCloskey
IBM LanguageWare Architect

... our external website:
http://www-306.ibm.com/software/globalization/topics/languageware/index.jsp
... our Alphaworks: http://www.alphaworks.ibm.com/tech/lrw
... our Wikipedia: http://en.wikipedia.org/wiki/Languageware

IBM Ireland Product Distribution Limited registered in Ireland with number
92815.  Registered office: Oldbrook House, 24-32 Pembroke Road,
Ballsbridge, Dublin 4


                                                                                         
                                                
  From:       "Armando Stellato" <stellato@info.uniroma2.it>                       
                                                      
                                                                                         
                                                
  To:         <uima-user@incubator.apache.org>                                     
                                                      
                                                                                         
                                                
  Date:       17/01/2009 00:18                                                           
                                                
                                                                                         
                                                
  Subject:    R: annotator based on regular expressions over (previous) annotations: state-of-work
in UIMA?                               
                                                                                         
                                                





Hi Igor,

thanks for the pointer. I've done a brief run under your LREC paper:

http://domino.research.ibm.com/comm/research_projects.nsf/pages/medicalinformatics.pubs.html/$FILE/CFE_sominsky-A4.pdf


and a presentation I found on the Web:

http://watchtower.coling.uni-jena.de/~coling/uimaws_lrec2008/slides/sominsky_20080531_talk_CFE.pdf


At a first glance, it seemed something quite different from what I needed.
FESL is a (I hope not to abuse the term :-) ) trasformator from UIMA
features. The target may be new UIMA features or other kind of data (as for
the title of the paper and the example of figure 3, which suggests its use
in Machine Learning, by extracting useful info from the existing
annotations, which can feed a learner). However, I tried to understand it
better, because it could anyway have the power to do what I was looking
for, which is to apply regular expressions over the content of a document,
with elements of the expressions being not only represented by strings,
digits etc.. but also by Annotation types. Like (with a very simple syntax)
telling that:
.* {<PersonTitle> <Name>}
will extract a new Annotation called Person when matching the (previously
annotated with PersonTitle and Name annotations) string: "Mr John Doe"
Lastly, I think I found the problem: in the paper you mention Reg Exps as
one of the 5 filters which can be applied to evaluate values (upper right
part of page 3 of the paper), but the overall search mechanism (points from
a) to f) upper LEFT part of page 3) is not based on regular expressions
nor, I think, has their power (though I will delve into the details of
point f) with further reading).

On the basis of what I got from the reading, I think it is not what I need,
though it could surely be included as part of it. For example (again simple
syntax):

.* {<Person>} "salary" <Currency>:normalizedvalue > 300000€

To extract instances of RichPerson

If I missed some crucial aspect, please let me know,

Thanks in advance,

Armando Stellato


> -----Messaggio originale-----
> Da: Igor Sominsky [捯mailto:sominsky@gmail.com]
> Inviato: venerdì 16 gennaio 2009 22.59
> A: uima-user@incubator.apache.org
> Oggetto: Re: annotator based on regular expressions over (previous)
> annotations: state-of-work in UIMA?
>
> Armando,
>
> In posted version of CFE you can alter the value of an extracted feature
by
> applying a Java regular expression. The code that is currently under
> development would allow to combine several values by using Java regular
> expressions or math expressions. The grammar of math expressions include
> capability for using java functions and constants (through reflection)
>
> I hope that answers your question. Please let me know if you need more
> information
>
> Thank
> Igor
>
>
> ----- Original Message -----
> From: "Armando Stellato" <stellato@info.uniroma2.it>
> To: "UIMA" <uima-user@incubator.apache.org>
> Sent: Friday, January 16, 2009 1:19 PM
> Subject: annotator based on regular expressions over (previous)
annotations:
> state-of-work in UIMA?
>
>
> > Hi all,
> >
> >
> >
> > From a few posts, like the one at the following link:
> >
> >
> >
> > http://osdir.com/ml/apache.uima.general/2008-05/msg00070.html
> >
> >
> >
> > it seems that there is some interest in seeing such kind of processor
in
> > the
> > UIMA array of available components.
> >
> >
> >
> > Since we're considering working on developing a new one, but would
prefer
> > not to reinvent the wheel J, I'm asking if there is already someone
doing
> > the same and, in case, get pointers to their work, know if it is
> > available,
> > if it's still in work-in-progress etc.
> >
> >
> >
> > Best regards,
> >
> >
> >
> > Armando Stellato
> >
> >
> >
> > --------------------------------------------------
> >
> >
> >
> > Ing. Armando Stellato, PhD
> >
> > AI Research Group,
> >
> > Dept. of Computer Science, Systems and Production
> >
> > University of Roma, Tor Vergata
> >
> > Via del Politecnico 1 00133 ROMA (ITALY)
> >
> > tel: +39 06 7259 7330 (office, room A1-14);
> >
> >     +39 06 7259 7332 (lab)
> >
> > fax: +39 06 7259 7460
> >
> > e_mail: stellato@info.uniroma2.it
> >
> > yahoo: stellato75
> >
> > jabber(gtalk): stellato75@gmail.com <mailto:starred75@gmail.com>
> >
> > skype: odnamar
> >
> >
> >
> > --------------------------------------------------
> >
> >
> >
> >

Mime
View raw message