uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Fisher <safi...@gmail.com>
Subject Re: parsing html as a string from getDocumentText
Date Wed, 12 Mar 2008 18:50:00 GMT
Hi Roman,

You confirmed I wasn't losing my mind, but that I was negligent to the 
configuration of my AE -- I was running the Whiteboard2 flow controller 
instead of fixed flow, so another, older and to-be-discarded annotator 
was writing into it.  Time to throw out the trash! (Works fine now.)

Good learning experience. Thanks very much for your help.

-Sam

Roman Klinger wrote:
> Dear Sam,
>
> Sam Fisher wrote:
>> I'm probably not using Jericho correctly, because the output of the 
>> parser is the same as what went in (not stripped down to only the 
>> text content).
>>
>>   
>
> I also think so ;-). I experimented with Jericho in UIMA and did not 
> have any problems.
>
>> Has anyone had success using jericho with uima?
>>
>>   
>
> How did you use Jericho?
>
> I did not have any problems with
>
> new Source(new 
> StringReader("<html>Te<b>s</b>t<html>")).getTextExtractor();
>
> or in UIMA with
>
> new Source(new StringReader(jCas.getDocumentText())).getTextExtractor();
>
>
> Best regards,
> Roman
>
>

Mime
View raw message