poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roel De Nijs <hyste...@telenet.be>
Subject Retrieving style from a Word-document
Date Tue, 08 Sep 2009 19:50:15 GMT
Hi all,

I'm trying to get the style of the text in a Word 2002 document. I'm using the
following code

    private static void printTextWithStyle(HWPFDocument doc) throws Exception {
        StringBuilder sb = new StringBuilder();
        Range range = doc.getRange();
        for (int i = 0; i < range.numParagraphs(); i++) {
            Paragraph paragraph = range.getParagraph(i);
            for (int j = 0; j < paragraph.numCharacterRuns(); j++) {
                CharacterRun run = paragraph.getCharacterRun(j);
                sb.append(run.isBold() ? "[b]" : "");
                sb.append(run.isItalic() ? "[i]" : "");
                sb.append(run.getUnderlineCode() > 0 ? "[u]" : "");
                sb.append(run.text());
                sb.append(run.getUnderlineCode() > 0 ? "[/u]" : "");
                sb.append(run.isItalic() ? "[/i]" : "");
                sb.append(run.isBold() ? "[/b]" : "");
            }
        }
        System.out.println(sb.toString());
    }

When I run this code with the document you can download at
http://www.orbitfiles.com/download/id5274871546.html, I get the following output:

    De handleidingen worden geleverd op cd-rom ( in pdf- formaat ). Verzendings-
en behandelingskosten bedragen [b][i]2 €[/i][/b]. Indien u nog vragen heeft ,
kunt u mij via e-mail of telefonisch bereiken ([u]liefst[/u] met vermelding van
de referentie die bij elke handleiding staat ).
    ...
    [b]Audi TT Coupe 2000-2004 , Roadster 2001-2004 Manual[/b]
    taal : Engels , meer dan 400 blz. , 6 € , referentie : audi003
    Audi RS4 Avant 2.7 Biturbo Training Guide
    taal : Engels , 39 blz. , 7 € , referentie : audi004
    Audi 100 / Audi  100 Quattro 1990 Owner's Manual
    taal : Engels , 180 blz. , 5 € , referentie : audi005

So the beginning of the document is being processed as expected, resulting also
in the expected output. But in the 4 last paragraphs the style information is
lacking for a reason I don't know. The number of paragraphs returned by
numParagraphs is 8 (I would expect 12).

Can somebody point out what's the problem? Or maybe I'm not using the correct
way to retrieve the required information?
Kind regards,
Roel


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message