poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MSB <markbrd...@tiscali.co.uk>
Subject Re: Extract Text with style/type information
Date Tue, 19 Jan 2010 08:48:37 GMT

Not easilly, no. By this, I mean that there is no method you can call to say,
for example, print out all of the information aboout this section of the
document.

But, you can get at detailed information by digging around a little in the
various methods but a lot does depend on exactly how you want to process the
document. It is possible for example to get at all of the tables in the
document or all of the pictures but these method calls remove some of the
context; you cannot tell what comes before or after the picture/table for
example. If you have a good search through the posts in the list, you will
be able to find some code we put together that allows you to get at the
tables - just for an example - as they occur in the document; it is simply a
matter of asking whether the Pagagraph object appeared in a table cell or
not.

If you can be more precise about exactly what information you want printing
out about each different type of object then it may be possible to give you
a better answer. Further, it is important to know which type of file you are
targeting - binary (.doc) or OpenXML (.docx) - as HWPF and XWPF have
different capabilities. Finally, you do need to be aware that HWPF in
particular is still a very immature API that is in need of a lot of
development; if you would be willing to undertake that work and develop
those areas that you require, I am certain that there will be a lot of
grateful users.

Yours

Mark B


markl16 wrote:
> 
> Hi everyone,
> 
> Im just researching Apache POI at the moment. I have done some simple Java
> programs, reading in a Word Document and printing out the text etc. 
> 
> Im just wondering is it possible to get style information based on each
> paragraph in the word document such as POI printing out if the paragraph
> is a Title header, or a list of bullet points, or an image, table etc. I
> have come accross range.getgetCharacterRun() which can provide some info
> such as font type but im looking for more deatiled information as
> mentioned above.
> 
> Any feedback appreciated.
> 
> Best
> Mark
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Extract-Text-with-style-type-information-tp27209960p27222594.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message