poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Cheng <jackguitaronl...@yahoo.com>
Subject Using POI (POIFS, HWPF) to parse MS word and extract necessary info?
Date Tue, 04 Nov 2003 14:21:12 GMT
Hi all,

Recently I am seeking a java-base solution to reading
and merging MS word documents into one. After 
googling, I come into 

this POI page. The features of the not-yet-released
HWPF actually are exactly what we need, but the sad
thing's that it's 

still in its early development phase.

I'm not sure whether it would be easy (or with a
reasonable effort)or not if i'm to actually implement
the functions i need 

with the underlying POIFS. Could anybody kindly shed
some light on this issue?

The actual scenario is as follows:
1. We need to break down a word file with a hierachy
into different word files. The level in the hierachy
is characterised by 

its style (whether it's "heading 1", "heading 2", etc)

    My problems are:
    a) would the content in the POIFS be sequencial?
so that it would be easy to code such that different
levels of hierarchy 

can be easily extracted
    b) Is it possible to identify different sections
by the style alone?

===========================================

The word document would somewhat look like this:

---------------------------------------
Heading 1
  some text under the top level

  Heading 2
    some text under the second level

      Heading 3
        ....
  Heading 2
    ....
---------------------------------------

The broken down version would be like

---------------------------------------
Heading 1
  some text under the top level
---------------------------------------

---------------------------------------
  Heading 2
    some text under the second level
---------------------------------------

etc...

==========================================

2. We will need to merge the word files into one
single file.

    My problems are:
    a) Would it be as easy as simple concatenation?
(too ideal? LoL)
    b) Can it be done with POIFS easily?


I really hope you could kindly give me some comments
and suggestions on this. Or perhaps some pointers? I'm
also seeking 

other possible ways of doing this, e.g. OpenOffice
java api, but until now i still don't know much about
that. :-P

Actually this is quite (frankly, it's "very") urgent,
I hope you could give me a response asap. Thank you so
much! :-D

Best Regards,
Jack Cheng

__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


Mime
View raw message