uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pascal Coupet" <pascal.cou...@temis.com>
Subject RE: Bewildered
Date Thu, 06 Mar 2008 17:05:36 GMT
Hi Dennis,

You will not be able to analyze directly documents in binary format
using the UIMA samples. There is no default converter included in the
package. So you should first convert them into a text file (using "save
as"by example) or implement an external converter within your


-----Original Message-----
From: Dennis Geller [mailto:dgeller@aptima.com] 
Sent: Thursday, March 06, 2008 10:10 AM
To: uima-user@incubator.apache.org
Subject: Re: Bewildered

Sorry that I was unclear. The bad characters appeared when I took the 
compiled tutorial and pointed it at a directory of mine, rather than the

one that came with the tutorial (no data problems in there!).

Could be that there was a jpeg in the directory, or an embedded image in

a word document.  I'll follow up on that tutorial reference . Thanks.
>> I just almost had a successful run. However, it coughed because a 
>> file had a "non-XML character, 0x0." 
> Where was this character?  If it was in your XML descriptors, then 
> that needs to be corrected.  It is possible to analyze arbitrary data,

> including "byte" data containing any characters, in UIMA; see 
>> This also happened when i was running the unmodified tutorial
> Can you say where this character occurred in the unmodified tutorial 
> example
> -Marshall

Dennis Geller, Ph.D. Computer and Communication Science   
Senior Software Developer
Direct Dial: 781.496.2461   Main Number: 781.935.3966 ext. 261   
Fax Number:  781.496-2498
E-mail:  dgeller@aptima.com
Aptima, Inc.
12 Gill Street, Suite 1400
Woburn, MA 01801 USA

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon this information by persons or
entities other than the intended recipient is prohibited. If you
received this in error, please contact the sender and delete the
material from any computer.

View raw message