uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Nioche <lists.digitalpeb...@gmail.com>
Subject Re: UIMA- Support for HTML, PDF, Doc files
Date Thu, 29 Sep 2011 08:51:40 GMT
Hi,

Have a look at the TikaAnnotator in the sandbox. It extracts the text and
metadata from various document formats and converts any available markup
into annotations

HTH

Julien


On 29 September 2011 07:28, abhishek <abhishek.k@sqlstar.com> wrote:

> Hi,
> While reading the docuemntation of UIMA, i found out that
> UIMA&nbsp;supports&nbsp;html files.
> &nbsp;
> However, when i am running the
> org.apache.uima.tools.docanalyzer.DocumentAnalyzer class, it fails to
> understand the text.
> &nbsp;
> Kindly let me know, the correct way to read these type of files.
> &nbsp;




-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message