incubator-odf-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ram Kane <ramdk...@gmail.com>
Subject Re: Is there a way to extract text on a page basis from odt ?
Date Wed, 28 Sep 2011 15:45:56 GMT
On Tue, Sep 27, 2011 at 10:38 PM, Devin Han <devinhan@apache.org> wrote:
>
>
> 2011/9/26 Ram Kane <ramdkane@gmail.com>
>>
>> I've tried that. The problem is that it works on a document level
>>
>> I need to be able to extract content for a given page.
>
> Does it make sense to extract content by paragraph?


Only if i could associate those paragraphs to their corresponding page
number. But i think that is the whole problem (parsing the document as
a series of pages) :/.



>>
>> Thx a lot for the code though.
>>
>>
>> On Mon, Sep 26, 2011 at 2:46 AM, Devin Han <devinhan@apache.org> wrote:
>> > Hi Ram,
>> >
>> > I suppose you only want to extract the text(header, footer, comments ,
>> > end
>> > note, etc) and don't care page break.
>> > Please see the sample code.
>> >
>> >       TextDocument
>> > textdoc=(TextDocument)TextDocument.loadDocument("textExtractor.odt");
>> >       EditableTextExtractor extractorD =
>> > EditableTextExtractor.newOdfEditableTextExtractor(textdoc);
>> >       String output = extractorD.getText();
>> >       System.out.println(output);
>> >
>> > This code fragment will return all of the context except header and
>> > footer.For content in footer and header, please reference.
>> >            Header header = textdoc.getHeader();
>> >            output =TextExtractor.getText(header.getOdfElement());
>> >            System.out.println(output);
>> >
>> >            Footer footer = textdoc.getFooter();
>> >            output =TextExtractor.getText(footer.getOdfElement());
>> >            System.out.println(output);
>> >
>
>
>
> --
> -Devin
>

Mime
View raw message