uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kl├╝gl <pklu...@uni-wuerzburg.de>
Subject Re: RTF Annotator
Date Tue, 03 Sep 2013 19:53:34 GMT
Hi,

what we are using is something like JODConverter or a simple bridge to 
microsoft word or open office in order to convert the document (rtf or 
doc/docx) to html. Then, we apply the HTMLAnnotator and HTMLConverter of 
UIMA Ruta in order to get plain text with annotations for the html tags. 
However, we do not have an (available) analysis engine for this complete 
process.

Best,

Peter

Am 01.09.2013 23:42, schrieb Dave Kincaid:
> Before I embark on building an RTF annotator I thought I'd ask around a bit to
> see if anyone had built such a thing. Most of the documents I have to handle
> are in RTF format. I can pretty easily extract the text only using something
> like Apache TIka, but there is important information in the formatting as well
> (bold, italic, font sizes, centering, tables, etc) that I'd like to use. Is
> anyone aware of a UIMA annotator that does this already?
>
> Thanks,
>
> Dave Kincaid
>


Mime
View raw message