Return-Path: Delivered-To: apmail-poi-user-archive@www.apache.org Received: (qmail 26950 invoked from network); 19 Jan 2010 08:49:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 Jan 2010 08:49:08 -0000 Received: (qmail 61778 invoked by uid 500); 19 Jan 2010 08:49:08 -0000 Delivered-To: apmail-poi-user-archive@poi.apache.org Received: (qmail 61759 invoked by uid 500); 19 Jan 2010 08:49:08 -0000 Mailing-List: contact user-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Users List" Delivered-To: mailing list user@poi.apache.org Received: (qmail 61743 invoked by uid 99); 19 Jan 2010 08:49:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jan 2010 08:49:08 +0000 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=FROM_LOCAL_NOVOWEL,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jan 2010 08:48:58 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1NX9lN-0005PL-OD for user@poi.apache.org; Tue, 19 Jan 2010 00:48:37 -0800 Message-ID: <27222594.post@talk.nabble.com> Date: Tue, 19 Jan 2010 00:48:37 -0800 (PST) From: MSB To: user@poi.apache.org Subject: Re: Extract Text with style/type information In-Reply-To: <27209960.post@talk.nabble.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: markbrdsly@tiscali.co.uk References: <27209960.post@talk.nabble.com> X-Virus-Checked: Checked by ClamAV on apache.org Not easilly, no. By this, I mean that there is no method you can call to say, for example, print out all of the information aboout this section of the document. But, you can get at detailed information by digging around a little in the various methods but a lot does depend on exactly how you want to process the document. It is possible for example to get at all of the tables in the document or all of the pictures but these method calls remove some of the context; you cannot tell what comes before or after the picture/table for example. If you have a good search through the posts in the list, you will be able to find some code we put together that allows you to get at the tables - just for an example - as they occur in the document; it is simply a matter of asking whether the Pagagraph object appeared in a table cell or not. If you can be more precise about exactly what information you want printing out about each different type of object then it may be possible to give you a better answer. Further, it is important to know which type of file you are targeting - binary (.doc) or OpenXML (.docx) - as HWPF and XWPF have different capabilities. Finally, you do need to be aware that HWPF in particular is still a very immature API that is in need of a lot of development; if you would be willing to undertake that work and develop those areas that you require, I am certain that there will be a lot of grateful users. Yours Mark B markl16 wrote: > > Hi everyone, > > Im just researching Apache POI at the moment. I have done some simple Java > programs, reading in a Word Document and printing out the text etc. > > Im just wondering is it possible to get style information based on each > paragraph in the word document such as POI printing out if the paragraph > is a Title header, or a list of bullet points, or an image, table etc. I > have come accross range.getgetCharacterRun() which can provide some info > such as font type but im looking for more deatiled information as > mentioned above. > > Any feedback appreciated. > > Best > Mark > > > -- View this message in context: http://old.nabble.com/Extract-Text-with-style-type-information-tp27209960p27222594.html Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org