poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Som Satpathy <somsatpa...@gmail.com>
Subject Re: Query regarding encoding used internally by apache POI libraries
Date Wed, 09 Sep 2009 03:30:50 GMT
"The microsoft file formats generally store text as either US-ASCII or
UCS-2. The type of the record/block/etc tells you which it is, so we can
turn that into java (unicode) strings"

Thanks for the input Nick. But one thing is still not clear, can I encode
the text as UTF_8?

When trying to extract non-english text like french, japanese etc, the
output is incomprehensible.

Is there any way encode non-english fonts using POI?


Regards,
Som

On Tue, Sep 8, 2009 at 3:18 PM, Nick Burch <nick@torchbox.com> wrote:

> On Tue, 8 Sep 2009, Som Satpathy wrote:
>
>> Does apache POI follow any particular encoding internally while extracting
>> MS office documents? If so what is the encoding that POI uses?
>>
>
> POI is written in Java, so uses native java strings almost everywhere.
> These are unicode
>
> The microsoft file formats generally store text as either US-ASCII or
> UCS-2. The type of the record/block/etc tells you which it is, so we can
> turn that into java (unicode) strings
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message