uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radwen ANIBA <arad...@gmail.com>
Subject Regular Expression
Date Sat, 20 Jun 2009 16:51:42 GMT
Hi,

I have a question concerning regular expression example annotator that comes
with one or more uima tutorial examples. One of these try to parse a text
input for a RE like this :

Pattern UniverseProductNumbers =
    Pattern.compile("\\b[U][A-Z][A-Z]-\\d\\d\\d\\d\\d\\b");
    Matcher matcher = UniverseProductNumbers.matcher(txt);
    int pos = 0;
    while (matcher.find(pos)) {
        ProductNumber productNumberAnnotation =
            new ProductNumber(aJCas);
            productNumberAnnotation.setProductLine("Universe");
            productNumberAnnotation.setBegin(matcher.start());
            productNumberAnnotation.setEnd(matcher.end());
            productNumberAnnotation.addToIndexes();

    pos = matcher.end();
    }
    Pattern BeyondProductNumbers =
        Pattern.compile("\\b[B][A-Z][A-Z]-\\d\\d\\d\\b");
        matcher = BeyondProductNumbers.matcher(txt);
        pos = 0;
        while (matcher.find(pos)) {
            ProductNumber productNumberAnnotation =
                new ProductNumber(aJCas);
                productNumberAnnotation.setProductLine("Beyond");
                productNumberAnnotation.setBegin(matcher.start());
                productNumberAnnotation.setEnd(matcher.end());
                productNumberAnnotation.addToIndexes();
        pos = matcher.end();
        }
        }

Here it is simple to deal with that but let's imagine we have 1000 regular
expression to search in a text file, is it any way to parse and charge RE
within a regular expression file and then to treat them one by one. To take
the same example is it possible to imagine a file with two tabulated columns
that contains for example :

Universe    "\\b[U][A-Z][A-Z]-\\d\\d\\d\\d\\d\\b"
Beyond     \\b[B][A-Z][A-Z]-\\d\\d\\d\\b"

And do the same job dynamically ?? If we want to update the regular
expression or add one we only have to update the regular expression file
instead of the source code of the annotator ?

Thank you

Radwen

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message