poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <n...@torchbox.com>
Subject Re: Problem Extracting Text from MS Word
Date Tue, 15 Aug 2006 13:51:55 GMT
On Tue, 15 Aug 2006, manumohedano@usal.es wrote:
> The problem is I also get text from internal information of MSWord, for
> example, the hyperlinks like this:
>
>   "4.1- Introducción PAGEREF _Toc142772733 \h 31
> HYPERLINK \l "_Toc142772734" 4.2- Apple webobjects PAGEREF _Toc142772734
> \h 32"
>
> Can you give me any solution??

Alas not really. It looks like these are stored in character runs, so 
they're being returned when you ask a paragraph for its runs.

You could try looking at the range type, and see if these problem runs 
have a different type you can exclude. Otherwise, patches to make hwpf
behave better are always appreciated :)

Nick

Mime
View raw message