uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl (JIRA) <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-3382) Ruta: REGEXP from WORDLIST
Date Fri, 25 Oct 2013 17:16:32 GMT

    [ https://issues.apache.org/jira/browse/UIMA-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805473#comment-13805473

Peter Klügl commented on UIMA-3382:

Short answer: no that is not possible with word lists, but...

The wordlists in UIMA Ruta are designed to work as a dictionary sensible to the current filtering
settings of your rule script. There are several kind of word lists, but internally they are
represented as a trie, a tree structure of chars. This can greatly improve the performance,
but makes the usage of regular expression impossible (at least as it is implemented right

UIMA Ruta also supports simple regexp rules: http://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.language.regexprule

You could use them to create a list of regular expression in a Ruta script like:

"azithromyci.?"-> Medication;
"azithromyci.+susp"-> Medication;

btw, you can also ask question on the mailing lists.

> --------------------------
>                 Key: UIMA-3382
>                 URL: https://issues.apache.org/jira/browse/UIMA-3382
>             Project: UIMA
>          Issue Type: Question
>          Components: ruta
>            Reporter: Olga Patterson
>            Priority: Trivial
>              Labels: features
> WORDLIST is defined as a list of text items. What if I have a list of regular expressions
that I want to mark as the same type. Is there a command that would do it?
> My use case is that I want to find medication statements in text, but there is a large
variation in spelling and dose description, so regular expressions are a more concise way
to cover all possible cases. So I was trying to use syntax below, but the only matches were
for those cases where there was a single word without special regex syntax.
> DECLARE Medication;
> WORDLIST MedicationList='Medication.regex';
> Paragraph{-> MARKFAST(Medication, MedicationList, true,1)};
> What can I change in the rules so that the items in MedicationList are treated as regular
> Thank you.

This message was sent by Atlassian JIRA

View raw message