uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Mauceri <mauc...@hermeneute.com>
Subject Re: R: Problema with Document Analyzer
Date Mon, 26 Feb 2007 15:47:05 GMT
Hi Gianluca,

I had a similar problem trying to analyze a document using 
jcas.getSofa().getSofaDataStream(), when I use jcas.getDocumentText() 
I'm in sync. As it was not crucial for me to use an input stream I did 
not investigate further maybe it is because of the use of SGML entities 
to hide '< 'and '>' characters?


Gianluca Mameli wrote:
>> What versions of UIMA and Java are you using, and what is your OS?
>>     
>
> Java version is 1.4.2_13-b06.
>
> OS is Windows XP Professional SP2
>
> UIMA version is UIMA_SDK_2_0_2 (I downloaded it from the http://www.alphaworks.ibm.com/tech/uima/download
site).
>
> Gianluca
>
>   
>> This may be an issue with the Swing JTextArea widget used in the
>> DocumentAnalyzer not dealing with newlines properly.  The sample text
>> files have CRLF newlines but the viewer may count these as just 1
>> character for purposes of computing offsets.  I've heard of simliar
>> problems before on older versions of Java.
>>     
>
> Are there anyone that solve this problem?
>
> Gianluca
>
>
>   
>> -----Messaggio originale-----
>> Da: lally.adam@gmail.com [mailto:lally.adam@gmail.com] Per conto di Adam Lally
>> Inviato: venerdì 23 febbraio 2007 19.44
>> A: uima-user@incubator.apache.org; Gianluca Mameli
>> Oggetto: Re: Problema with Document Analyzer
>>
>> Hi Gianluca,
>>
>> In general you need to be subscribed to the list in order to post, but
>> I've manually allowed this message through.  To subscribe send mail to
>> uima-user-subscribe@incubator.apache.org.
>>
>> On 2/23/07, Gianluca Mameli <gmameli@cogito.expertsystem.it> wrote:
>>     
>>> <snip/>
>>> I tried to parse the text in the process method (method of
>>> JCasAnnotator_ImplBase implementation) using my annotation (begin and
>>> end and java indexOf String method etc). It find the exact text, but
>>> when document analyzer show me the result it does not a correct
>>> annotation (highlight text with some offset).
>>>
>>>       
>> What versions of UIMA and Java are you using, and what is your OS?
>>
>> This may be an issue with the Swing JTextArea widget used in the
>> DocumentAnalyzer not dealing with newlines properly.  The sample text
>> files have CRLF newlines but the viewer may count these as just 1
>> character for purposes of computing offsets.  I've heard of simliar
>> problems before on older versions of Java.
>>
>> Regards,
>>   -Adam
>>     
>
>   

-- 
Cordialement/Regards
Christian Mauceri
http://hermeneute.com/Christian


Mime
View raw message