uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: setBegin and setEnd when annotating a text
Date Wed, 24 Jun 2009 17:32:05 GMT
Not sure what you're doing here...  but a couple of thoughts:

1) if the //// is meant to separate "documents" and you want to process
each one separately, you could have a cas reader component read the
original data and split it in separate docs and put each one in a CAS to
be processed independently.

2) if you just need to reset the begin and end for an annotation, these
are settable fields - so you can just set them to what you want.  But be
aware that some methods (e.g. getcoveredtext()) use these values to
locate the characters in the subject of analysis, and if you changed the
begin and end, it is likely other code would break.

-Marshall

Radwen ANIBA wrote:
> Hi
>
> Well I have to look for regular expression within a file But I need to
> modify a little bit begin and end values since the document itself contain a
> text like this :
>
>
> lsdkqjqldjqslkdjqsldkjqsldqjsjjjjjqslkdjqslkdjqlsdkjqlsdkj
> ////
> qsmdkqsmdlkqsmdkqjjjjqsùmdlqsùdlqsùdml
> ////
> dsdjjjjjqsùmdlùqld
>
> The text I'am looking for  is "jjjj" But the separator //// MUST reset
> values of begin and end to zero each time we meet it in the document, I mean
> for the first line we must have jjjj found in [30-34], in the line 3 it is
> found in [18-21] etc, so the begin and end values are relative to the line
> and not the entire document.
>
> Anyone have a solution to modify begin and end values ?
>
> Thx
>
> Rad
>
>   

Mime
View raw message