poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MSB <markbrd...@tiscali.co.uk>
Subject Re: Extracting plaintext from word
Date Tue, 15 Sep 2009 06:13:07 GMT

Have a look at the static stripFields() method defined in the Range class,
that usually does the trick and removes those 'odd' characters.

If this does not work (and I fully believe that it will), all you need to do
is identify the characters in question - print out their integer value for
instance - and then use java's own String handling methods to remove them.


Mark B

Drone42 wrote:
> I'm using POI with Aperture to extract plain text from word documents.
> The extracted text contains word special characters, i.e. '&#21; &#19;'
> which I would like to remove. Does POI provide a configuration option to
> get only the true plaintext?
> Regards,
> Gert.
> Please help Logica to respect the environment by not printing this email 
> / Pour contribuer comme Logica au respect de l'environnement, merci de ne
> pas imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und
> helfen Sie so Logica dabei, die Umwelt zu sch├╝tzen. /  Por favor ajude a
> Logica a respeitar o ambiente nao imprimindo este correio electronico.
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be copied,
> disclosed to, retained or used by, any other party. If you are not an
> intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender. Thank you.

View this message in context: http://www.nabble.com/Extracting-plaintext-from-word-tp25440878p25448520.html
Sent from the POI - User mailing list archive at Nabble.com.

To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

View raw message