incubator-odf-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devin Han <devin...@apache.org>
Subject Re: Is there a way to extract text on a page basis from odt ?
Date Mon, 26 Sep 2011 05:46:27 GMT
Hi Ram,

I suppose you only want to extract the text(header, footer, comments , end
note, etc) and don't care page break.
Please see the sample code.

       TextDocument
textdoc=(TextDocument)TextDocument.loadDocument("textExtractor.odt");
       EditableTextExtractor extractorD =
EditableTextExtractor.newOdfEditableTextExtractor(textdoc);
       String output = extractorD.getText();
       System.out.println(output);

This code fragment will return all of the context except header and
footer.For content in footer and header, please reference.
            Header header = textdoc.getHeader();
            output =TextExtractor.getText(header.getOdfElement());
            System.out.println(output);

            Footer footer = textdoc.getFooter();
            output =TextExtractor.getText(footer.getOdfElement());
            System.out.println(output);

More about TextExtractor, please reference:
http://incubator.apache.org/odftoolkit/simple/document/cookbook/TextExtractor.html#Get%20Text
There is a demo about extracting text:
http://incubator.apache.org/odftoolkit/simple/demo/demo2.html
If you never use Simple API before,please reference this guide:
http://incubator.apache.org/odftoolkit/simple/gettingstartguide.html

2011/9/24 Ram Kane <ramdkane@gmail.com>

> Hi,
>
> I need to extract all text (header, footer, comments, endnote, etc) from an
> ODT document. I need to do it on a page by page basis. I'm aware that ODTs
> are basically structured by paragraphs and headings, but i'd like to know
> if
> there's a way to achieve what i need.
>
> Thanks a lot.
>



-- 
-Devin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message