poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <n...@torchbox.com>
Subject Re: No-break space and middle dot in String produced with WordExtractor
Date Fri, 14 Jul 2006 09:36:06 GMT
On Thu, 13 Jul 2006, nguessan@kouame.us wrote:
> I used WordExtractor to extract texts from MS Word documents. The 
> documents have many non-text charaters that display as squares, and 
> sometimes as lines. However, most of the texts appear clearly. I did hex 
> dumps of the texts and found that some squares have the values A0 and 
> some have B7. I tried to remove them using the String method "String 
> replace(char oldChar, char newChar)", but it does not remove them.

That sounds like it's a string replacement issue, and not a poi issue. My 
guess is that you're not correctly identifying the codes for the 
characters. Most good learning java books should help you there.


To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

View raw message