poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MSB <markbrd...@tiscali.co.uk>
Subject Re: Extract Text with style/type information
Date Tue, 26 Jan 2010 17:49:51 GMT

In that case, you want to take a look at the format for the file - or rather
the specification - which in this case is public knowledge. Take a look here
as a start - http://office.microsoft.com/en-us/products/HA101723691033.aspx
- if you look down the page, there is an entry that says something like
"Where can I find more information about the OpenXML format...", click on
this and there are other links to developer resources that may be valuable.
ECMA-376 or ISO/IEC 29500 is the name given to the OpenXML file format
standard that Microsoft are suposed to ensure that OpenXML documents avide
by. You should be able to get a look at either of these documents on the web
and they may help.


Mark B

markl16 wrote:
> Yup id imagine the job can be done without XWPF, after all docx is a type
> of xml anyway. I just need to learn what tags in docx xml represent a
> header or plain paragraph, look for them and then copy them into my own
> xml format.
> Best
> Mark
> MSB wrote:
>> Sorry Mark, my knowledge of XWPF is VERY limited indeed. It may be best
>> to start a new thread asking specifically about XWPF if you want to get a
>> better response. Having said that though and speaking as someone with a
>> limited knowledge of such things, would it not be possible to transform
>> the xml formatted file into 'your' xml format and simply remove XWPF from
>> the equation entirely?
>> Yours
>> Mark B

View this message in context: http://old.nabble.com/Extract-Text-with-style-type-information-tp27209960p27326828.html
Sent from the POI - User mailing list archive at Nabble.com.

To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

View raw message