uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian & Fran <bnfmch...@gmail.com>
Subject Help with Alphanumeric Tokens
Date Thu, 21 Dec 2017 14:39:00 GMT
Good day, Peter,

We are learning UIMA Ruta and are having some problems with it. As I posted on stackoverflow,
we have a lot of data in our documents that does not fit the traditional natural language
mold. We have a lot of alphanumeric data such as file hashes, email addresses, domains, etc.
We tried to re-work the JFlex lexer and re-build ruta-core, but are now struggling to get
it working in the Ruta Workbench. Is there a better way to parse out and annotate such data?
A file containing sentences or tabular data with MD5 hashes would be a great example.

Thank you,

Sent from my iPhone

  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message