uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Nioche <lists.digitalpeb...@gmail.com>
Subject Re: UIMA- Support for HTML, PDF, Doc files
Date Thu, 29 Sep 2011 08:51:40 GMT

Have a look at the TikaAnnotator in the sandbox. It extracts the text and
metadata from various document formats and converts any available markup
into annotations



On 29 September 2011 07:28, abhishek <abhishek.k@sqlstar.com> wrote:

> Hi,
> While reading the docuemntation of UIMA, i found out that
> UIMA&nbsp;supports&nbsp;html files.
> &nbsp;
> However, when i am running the
> org.apache.uima.tools.docanalyzer.DocumentAnalyzer class, it fails to
> understand the text.
> &nbsp;
> Kindly let me know, the correct way to read these type of files.
> &nbsp;

*Open Source Solutions for Text Engineering


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message