uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominik Terweh <d.ter...@drooms.com>
Subject Matching REPLACEd text
Date Mon, 20 May 2019 07:19:39 GMT
Dear All,

I am using uima to detect certain parts of contracts. Unfortunately the documents are not
originals but scanned and due to the recognition of OCR I have a rather high percentage of
errors. Furthermore I have some situations, where I would like to get the root or lemma of
a word and match on their basis, so I thought the best solution for both of these problems
would be the REPLACE() action, but unfortunately I seem not to get it working.

What I would like to achieve, given the sentences:
“They worked hard”,
“They were warking hard”,
“He vvorks hard”,
“I work hard”
I would want to perform some OCR correction (“warking” -> “working”, “vvorks”
-> “works”), like:
WrongWord{-> REPLACE(CorrectWord)};
And some stemming/lemmatizing (“working”,”works”,”worked” -> “work”), like:
                Word{-> REPLACE(Stem)};
After that I would like to match on the replaced text, by simply using the stems, like:
                ANY “work” “hard”{-> MARK(WhatIWant, 2, 3)};

Now my main questions are:

  *   Is it possible to match on replaced text?
  *   If so, can I highlight it in the original text?
  *   Can I see the changed text in the Annotation Browser View?
  *   Do I first need to write the outcome to a file and then reread and process it?

I hope you can help me with my request,

Dominik Terweh


Drooms GmbH
Eschersheimer Landstraße 6
60322 Frankfurt, Germany

Mail:   d.terweh@drooms.com<mailto:d.terweh@drooms.com>


Drooms GmbH; Sitz der Gesellschaft / Registered Office: Eschersheimer Landstr. 6, D-60322
Frankfurt am Main; Geschäftsführung / Management Board: Alexandre Grellier;
Registergericht / Court of Registration: Amtsgericht Frankfurt am Main, HRB 76454; Finanzamt
/ Tax Office: Finanzamt Frankfurt am Main, USt-IdNr.: DE 224007190
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message