Return-Path: Delivered-To: apmail-poi-user-archive@www.apache.org Received: (qmail 76261 invoked from network); 2 Jul 2009 16:05:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Jul 2009 16:05:29 -0000 Received: (qmail 68686 invoked by uid 500); 2 Jul 2009 16:05:39 -0000 Delivered-To: apmail-poi-user-archive@poi.apache.org Received: (qmail 68648 invoked by uid 500); 2 Jul 2009 16:05:39 -0000 Mailing-List: contact user-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Users List" Delivered-To: mailing list user@poi.apache.org Received: (qmail 68638 invoked by uid 99); 2 Jul 2009 16:05:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jul 2009 16:05:39 +0000 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=FROM_LOCAL_NOVOWEL,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jul 2009 16:05:29 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1MMOma-000491-4o for user@poi.apache.org; Thu, 02 Jul 2009 09:05:08 -0700 Message-ID: <24309490.post@talk.nabble.com> Date: Thu, 2 Jul 2009 09:05:08 -0700 (PDT) From: MSB To: user@poi.apache.org Subject: RE: Use cases for MS Word files In-Reply-To: <24301974.post@talk.nabble.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: markbrdsly@tiscali.co.uk References: <52901A4EF141AF41B085BEF645EB7579049FFC3286@HCAZMAIL2.hitachiconsulting.net> <24285074.post@talk.nabble.com> <52901A4EF141AF41B085BEF645EB7579049FFC34BF@HCAZMAIL2.hitachiconsulting.net> <24301974.post@talk.nabble.com> X-Virus-Checked: Checked by ClamAV on apache.org Justin, To keep you up to date with progress, I have only been able to spend about an hour on the code today and it is still very, very far from working properly. Just to give you some oversight, I was simply looking to merge one paragraph from a Word document into another Word document, moreover, to be able to identify which paragraph to merge and where to insert it. Once I had this working, the plan was to add further methods that would have allowed me to specifiy a list of paragraph numbers to merge from one document into another or even a range of the same. As it stands, the code looks like this; import org.apache.poi.hwpf.HWPFDocument; import org.apache.poi.hwpf.usermodel.Section; import org.apache.poi.hwpf.usermodel.Range; import org.apache.poi.hwpf.usermodel.Paragraph; import org.apache.poi.hwpf.usermodel.ParagraphProperties; import org.apache.poi.hwpf.usermodel.CharacterRun; import org.apache.poi.hwpf.usermodel.CharacterProperties; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.FileNotFoundException; import java.io.IOException; /** * * @author win user */ public class MergeTest { private HWPFDocument mergeToDocument = null; private Range mergeToDocRange = null; private String mergeToDocName = null; public MergeTest(String mergeToDocName) throws FileNotFoundException, IOException { File mergeToFile = new File(mergeToDocName); FileInputStream fis = new FileInputStream(mergeToFile); this.mergeToDocument = new HWPFDocument(fis); this.mergeToDocRange = this.mergeToDocument.getRange(); fis.close(); fis = null; this.mergeToDocName = mergeToDocName; } public void mergeParaFrom(String mergeFilename, int numParaToMerge, int numParaMergeAfter) throws FileNotFoundException, IOException { File mergeFromFile = new File(mergeFilename); FileInputStream fis = new FileInputStream(mergeFromFile); HWPFDocument mergeFromDoc = new HWPFDocument(fis); Range docRange = mergeFromDoc.getRange(); if(numParaToMerge > docRange.numParagraphs()) { throw new IllegalArgumentException("Value passed to numParaToMerge " + "parameter greater than the number of Paragraphs in the document."); } if(numParaMergeAfter > this.mergeToDocRange.numParagraphs()) { throw new IllegalArgumentException("Value passed to numParaMergeAfter " + "parameter greater than the number of Paragraphs in the document."); } Paragraph paraToMerge = docRange.getParagraph(numParaToMerge); this.mergeParaIntoDoc(this.mergeToDocRange.getParagraph(numParaMergeAfter), paraToMerge); } public void mergeParaIntoDoc(Paragraph mergeAfterPara, Paragraph toMergePara) { CharacterRun newCharRun = null; CharacterRun toMergeCharRun = null; CharacterProperties charProps = null; String text = null; ParagraphProperties paraProps = toMergePara.cloneProperties(); Range range = mergeAfterPara.insertAfter(paraProps, 0); System.out.println("Text: " + toMergePara.text()); int numCharRuns = toMergePara.numCharacterRuns(); for(int i = 0; i < numCharRuns; i++) { toMergeCharRun = toMergePara.getCharacterRun(i); text = toMergeCharRun.text(); text = CharacterRun.stripFields(text); charProps = toMergeCharRun.cloneProperties(); newCharRun = range.insertAfter(text, charProps); //newCharRun = range.insertAfter(text); range = newCharRun; } } public void saveMergedDocument() throws FileNotFoundException, IOException { this.saveMergedDocument(this.mergeToDocName); } public void saveMergedDocument(String filename) throws FileNotFoundException, IOException { File outputFile = null; FileOutputStream fos = null; try { outputFile = new File(filename); fos = new FileOutputStream(outputFile); this.mergeToDocument.write(fos); } finally { if(fos != null) { try { fos.close(); fos = null; } catch(Exception ex) { // I G N O R E } } } } /** * @param args the command line arguments */ public static void main(String[] args) { MergeTest mergeTest = null; try { mergeTest = new MergeTest("C:/temp/Merge Document.doc"); mergeTest.mergeParaFrom("C:/temp/Source Document.doc", 2, 3); mergeTest.saveMergedDocument("C:/temp/Merge Results.doc"); } catch(Exception ex) { System.out.println("Caught an: " + ex.getClass().getName()); System.out.println("Message: " + ex.getMessage()); System.out.println("Stacktrace follows.............."); ex.printStackTrace(System.out); } } } Whilst I think that the basic premise is sound - insert a new paragraph into the document and add to it each character run from that paragraph that is being merged, all of the 'style' information - font, size, etc - is lost when the paragraph is inserted and so I think I am looking at writing methods to deep copy the CharacterProperties and most likely the ParagraphProperties as well. Will take a look at the source for the cloneProperties() methods firstly though for clues. If it is possible to get this to work, there are still going to be lots of other problems; pictures, tables, OLE objects, what happens if the text to be merged is arranged into columns and so on. I will keep playing with the code when I have the time - and once it cools down a little around here - and let you know what happens; as before though, I cannot promise when this will be. Am also going to look into an alternative approach where paragraphs are extracted from documents and merged to form a new document - could be tricky but might work. Yours Mark B MSB wrote: > > Hello Justin, > > Not to hand, no I do not. Having said that I am quite willing to try and > put something together but cannot promise a time scale, sorry. If I have > any time today, I will look into writing something. Can I just ask how you > want to perform the merge? Do you want to simply copy text from one > document into an existing document or do you want to take some text from > two or more documents and merge that into a new document? > > Thinking a little bit more overnight, the answer to merging documents > ought to have been 'yes but with a caveat'; fonts could be an issue but I > am not at all sure about this and it would require testing. I am thinking > here about a document that could have been created on another machine > entirely and then emailed to you; if it uses an obscure font then we could > face a problem however, this is hard to prove until some testing is > undertaken. > > Yours > > Mark B > > > Beltran, Justin wrote: >> >> Hi Mark, >> >> Do you have an examples of how to merge different word documents? I've >> seen code to parse a word doc, but not how to merge different documents. >> >> Justin >> >> -----Original Message----- >> From: MSB [mailto:markbrdsly@tiscali.co.uk] >> Sent: Tuesday, June 30, 2009 11:56 PM >> To: user@poi.apache.org >> Subject: Re: Use cases for MS Word files >> >> >> Morning Justin, >> >> I think that the answers to your questions are yes, yes, no and no in >> that >> order. Do not take this as the final answer however as I have not used >> HWPF\XSSF for a while now and the project could have advanced since that >> time. >> >> As for other open source APIs, there is not another one that I am aware >> of >> which targets both the binary and OPenXML file formats. There is the >> OpenXML4j project at Sourceforge >> (http://sourceforge.net/projects/openxml4j/) but this is 'limited' to >> just >> the XML based file format. Also, I have not used that tool so cannot >> speak >> to it's feature set, sorry. Of course, there are commercial tools - >> Aspose >> is the one that springs to mind. >> >> While OLE might have been an option if you were targetting just Windows >> platforms. OpenOffice could offer you an alternative. It is open source >> and >> platform independent but quite large to deploy. UNO is not an easy >> technique/interface to learn and I do not have complete confidence in >> OpenOffice's abilities to accurately render complex documents; at least >> in >> the binary (OLE2CDF) file format. Further, applications that use it can >> be >> quite slow because you will actually be manipulating an instance of the >> application rather than creating a file. Finally, there are complications >> if >> you want to run it in a client server configuration as you will need to >> create what is termed a 'connection aware' client at the very least. >> >> If you have the time, it might be worth seeing what would be required to >> add >> the necessary capabilities into HWPF\XWPF. I am certain there are others >> who >> would like to see this sort of functionality and would be delighted if >> you >> could join the development team and contribute patches. >> >> Yours >> >> Mark B >> >> >> Beltran, Justin wrote: >>> >>> Hi all, >>> >>> I'm doing initial research on a project and I'm trying to see what how >>> mature the capabilities are in POI in regards to the following: >>> >>> >>> 1. Parsing text in documents (i.e. in paragraphs, tables, etc.) >>> >>> 2. Merging different word documents >>> >>> 3. Creating hyperlinks (not to external URLs, but to other places >>> in >>> document) >>> >>> 4. Creating table of contents >>> >>> If POI currently doesn't have these capabilities, are there any other >>> open >>> source Java packages that can deliver the same functionality? Thanks in >>> advance! >>> >>> Justin >>> >>> >>> >>> >>> This e-mail is intended solely for the person or entity to which it is >>> addressed >>> and may contain confidential and/or privileged information. Any review, >>> dissemination, >>> copying, printing or other use of this e-mail by persons or entities >>> other >>> than the >>> addressee is prohibited. If you have received this e-mail in error, >>> please >>> contact >>> the sender immediately and delete the material from any computer. >>> To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com >>> Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas, >>> Texas 75201 >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Use-cases-for-MS-Word-files-tp24281577p24285074.html >> Sent from the POI - User mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org >> For additional commands, e-mail: user-help@poi.apache.org >> >> This e-mail is intended solely for the person or entity to which it is >> addressed >> and may contain confidential and/or privileged information. Any review, >> dissemination, >> copying, printing or other use of this e-mail by persons or entities >> other than the >> addressee is prohibited. If you have received this e-mail in error, >> please contact >> the sender immediately and delete the material from any computer. >> To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com >> Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas, >> Texas 75201 >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org >> For additional commands, e-mail: user-help@poi.apache.org >> >> >> > > -- View this message in context: http://www.nabble.com/Use-cases-for-MS-Word-files-tp24281577p24309490.html Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org