poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doppelhofer Andreas" <Andreas.Doppelho...@salomon.at>
Subject AW: AW: AW: how to set character encoding in new doc file
Date Mon, 25 Jan 2010 09:21:12 GMT
I want to store unicode characters in word doc, but if i store some russian
Characters only "?" will be displayed. (these chracters exists in unicode)
I think the encoding of these characters are unicode because when i print it to
sysout they will be display correctly.

This sample get's the text from doc and print it to stdout

            System.out.println("#########");
            TextPiece piece;
            Iterator textPieces = mydoc_output.getTextTable().getTextPieces().iterator();
            String text1;
            StringBuffer buffer = new StringBuffer();
            while (textPieces.hasNext()) {
                piece = (TextPiece) textPieces.next();

                try {
                    text1 = new String(piece.getRawBytes(), "UTF-16LE");

                    buffer.append(text1);

                } catch (UnsupportedEncodingException e) {
                    throw new InternalError("Standard Encoding " + "UTF-16LE" + "not found,
JVM broken");
                }
            }
            text1 = buffer.toString();
            System.out.println(text1);
            System.out.println("+#+#+#+#+#+");

e.q.
#########
ﻱﺑẬ


"April"
"Апрель"
+#+#+#+#+#+

Then i add text1 to the range, i am getting only "?" for russian characters.
--begin output word doc 

???

"April"
"??????" 
-- end word doc

dops



> -----Ursprüngliche Nachricht-----
> Von: MSB [mailto:markbrdsly@tiscali.co.uk] 
> Gesendet: Freitag, 22. Januar 2010 15:16
> An: user@poi.apache.org
> Betreff: Re: AW: AW: how to set character encoding in new doc file
> 
> 
> Hello Andreas,
> 
> I think that Nick is referring to explictly encoding the 
> Strings using the required/desired character encoding; there 
> are constructors for the java.lang.String class that do allow 
> you to specify the character encoding to the bytes you can 
> strip from the String you have read.
> 
> Remember that HWPF is still very imature as an API and it 
> could well be that the sort of thing you are asking for has 
> not yet been included. The best long term solution may be to 
> join the development team and contribute.
> 
> Yours
> 
> Mark B
> 
> 
> Doppelhofer Andreas wrote:
> > 
> > I use HWPFDocument(...) to read the document. When i print 
> the string 
> > (some text in doc) to stdout/stderr all characters are displayed 
> > correctly, put when i write it to a new doc file, all russian 
> > characters are stored with "?".
> > 
> > This is ok:
> > System.out.println(line);
> > 
> > This is nok: (after opening with word) range.insertAfter(line);
> > 
> > dops
> > 
> >> -----Ursprüngliche Nachricht-----
> >> Von: Nick Burch [mailto:nick.burch@alfresco.com]
> >> Gesendet: Freitag, 22. Januar 2010 11:20
> >> An: POI Users List
> >> Betreff: Re: AW: how to set character encoding in new doc file
> >> 
> >> On Fri, 22 Jan 2010, Doppelhofer Andreas wrote:
> >> > Can anybody help me with this problem?
> >> 
> >> Word (plus excel, powerpoint etc) can store strings as unicode or 
> >> non-unicode. POI works only with java unicode strings, and handles 
> >> reading and writing the strings to the appropriate kinds 
> of bytes for 
> >> you.
> >> 
> >> Make sure you're correctly passing your strings as unicode 
> into java, 
> >> converting the encoding as needed.
> >> 
> >> Nick
> >> 
> >> 

-- 


Salomon Automation GmbH - Friesachstrasse 15 - A-8114 Friesach bei Graz
Sitz der Gesellschaft: Friesach bei Graz
UID-NR:ATU28654300 - Firmenbuchnummer: 49324 K
Firmenbuchgericht: Landesgericht für Zivilrechtssachen Graz


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message