uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Klinger <roman.klin...@scai.fraunhofer.de>
Subject Re: parsing html as a string from getDocumentText
Date Wed, 12 Mar 2008 18:18:29 GMT
Dear Sam,

Sam Fisher wrote:
> I'm probably not using Jericho 
> correctly, because the output of the parser is the same as what went in 
> (not stripped down to only the text content).

I also think so ;-). I experimented with Jericho in UIMA and did not 
have any problems.

> Has anyone had success using jericho with uima?

How did you use Jericho?

I did not have any problems with

new Source(new StringReader("<html>Te<b>s</b>t<html>")).getTextExtractor();

or in UIMA with

new Source(new StringReader(jCas.getDocumentText())).getTextExtractor();

Best regards,

Roman Klinger
Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
Schloss Birlinghoven
D-53754 Sankt Augustin
Tel.: +49-2241-14-2360
Fax.: +49-2241-14-4-2360
email: roman.klinger@scai.fhg.de

View raw message