poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <n...@torchbox.com>
Subject Re: Query regarding encoding used internally by apache POI libraries
Date Tue, 08 Sep 2009 09:48:29 GMT
On Tue, 8 Sep 2009, Som Satpathy wrote:
> Does apache POI follow any particular encoding internally while 
> extracting MS office documents? If so what is the encoding that POI 
> uses?

POI is written in Java, so uses native java strings almost everywhere. 
These are unicode

The microsoft file formats generally store text as either US-ASCII or 
UCS-2. The type of the record/block/etc tells you which it is, so we can 
turn that into java (unicode) strings

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message