Mailing-List: contact user-help@poi.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "POI Users List" <user@poi.apache.org>
Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates
 216.139.236.158 as permitted sender)
Message-ID: <24309490.post@talk.nabble.com>
Date: Thu, 2 Jul 2009 09:05:08 -0700 (PDT)
From: MSB <markbrdsly@tiscali.co.uk>
To: user@poi.apache.org
Subject: RE: Use cases for MS Word files
In-Reply-To: <24301974.post@talk.nabble.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
References: 
 <52901A4EF141AF41B085BEF645EB7579049FFC3286@HCAZMAIL2.hitachiconsulting.net>
 <24285074.post@talk.nabble.com>
 <52901A4EF141AF41B085BEF645EB7579049FFC34BF@HCAZMAIL2.hitachiconsulting.net>
 <24301974.post@talk.nabble.com>


Justin,

To keep you up to date with progress, I have only been able to spend about
an hour on the code today and it is still very, very far from working
properly. Just to give you some oversight, I was simply looking to merge one
paragraph from a Word document into another Word document, moreover, to be
able to identify which paragraph to merge and where to insert it. Once I had
this working, the plan was to add further methods that would have allowed me
to specifiy a list of paragraph numbers to merge from one document into
another or even a range of the same. As it stands, the code looks like this;

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.Section;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.ParagraphProperties;
import org.apache.poi.hwpf.usermodel.CharacterRun;
import org.apache.poi.hwpf.usermodel.CharacterProperties;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

/**
 *
 * @author win user
 */
public class MergeTest {

    private HWPFDocument mergeToDocument = null;
    private Range mergeToDocRange = null;
    private String mergeToDocName = null;

    public MergeTest(String mergeToDocName) throws FileNotFoundException,
IOException {
        File mergeToFile = new File(mergeToDocName);
        FileInputStream fis = new FileInputStream(mergeToFile);
        this.mergeToDocument = new HWPFDocument(fis);
        this.mergeToDocRange = this.mergeToDocument.getRange();
        fis.close();
        fis = null;
        this.mergeToDocName = mergeToDocName;
    }

    public void mergeParaFrom(String mergeFilename, int numParaToMerge,
                              int numParaMergeAfter) throws
FileNotFoundException, IOException {
        File mergeFromFile = new File(mergeFilename);
        FileInputStream fis = new FileInputStream(mergeFromFile);
        HWPFDocument mergeFromDoc = new HWPFDocument(fis);
        Range docRange = mergeFromDoc.getRange();
        if(numParaToMerge > docRange.numParagraphs()) {
            throw new IllegalArgumentException("Value passed to
numParaToMerge " +
                    "parameter greater than the number of Paragraphs in the
document.");
        }
        if(numParaMergeAfter > this.mergeToDocRange.numParagraphs()) {
            throw new IllegalArgumentException("Value passed to
numParaMergeAfter " +
                    "parameter greater than the number of Paragraphs in the
document.");
        }
        Paragraph paraToMerge = docRange.getParagraph(numParaToMerge);
       
this.mergeParaIntoDoc(this.mergeToDocRange.getParagraph(numParaMergeAfter),
paraToMerge);
    }

    public void mergeParaIntoDoc(Paragraph mergeAfterPara, Paragraph
toMergePara) {
        CharacterRun newCharRun = null;
        CharacterRun toMergeCharRun = null;
        CharacterProperties charProps = null;
        String text = null;
        ParagraphProperties paraProps = toMergePara.cloneProperties();
        Range range = mergeAfterPara.insertAfter(paraProps, 0);
        System.out.println("Text: " + toMergePara.text());
        int numCharRuns = toMergePara.numCharacterRuns();
        for(int i = 0; i < numCharRuns; i++) {
            toMergeCharRun = toMergePara.getCharacterRun(i);
            text = toMergeCharRun.text();
            text = CharacterRun.stripFields(text);
            charProps = toMergeCharRun.cloneProperties();
            newCharRun = range.insertAfter(text, charProps);
            //newCharRun = range.insertAfter(text);
            range = newCharRun;
        }
    }

    public void saveMergedDocument() throws FileNotFoundException,
IOException {
        this.saveMergedDocument(this.mergeToDocName);
    }

    public void saveMergedDocument(String filename) throws
FileNotFoundException, IOException {
        File outputFile = null;
        FileOutputStream fos = null;
        try {
            outputFile = new File(filename);
            fos = new FileOutputStream(outputFile);
            this.mergeToDocument.write(fos);
        }
        finally {
            if(fos != null) {
                try {
                    fos.close();
                    fos = null;
                }
                catch(Exception ex) {
                    // I G N O R E
                }
            }
        }
    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        MergeTest mergeTest = null;
        try {
            mergeTest = new MergeTest("C:/temp/Merge Document.doc");
            mergeTest.mergeParaFrom("C:/temp/Source Document.doc", 2, 3);
            mergeTest.saveMergedDocument("C:/temp/Merge Results.doc");
        }
        catch(Exception ex) {
            System.out.println("Caught an: " + ex.getClass().getName());
            System.out.println("Message: " + ex.getMessage());
            System.out.println("Stacktrace follows..............");
            ex.printStackTrace(System.out);
        }
    }
}

Whilst I think that the basic premise is sound - insert a new paragraph into
the document and add to it each character run from that paragraph that is
being merged, all of the 'style' information - font, size, etc - is lost
when the paragraph is inserted and so I think I am looking at writing
methods to deep copy the CharacterProperties and most likely the
ParagraphProperties as well. Will take a look at the source for the
cloneProperties() methods firstly though for clues. If it is possible to get
this to work, there are still going to be lots of other problems; pictures,
tables, OLE objects, what happens if the text to be merged is arranged into
columns and so on.

I will keep playing with the code when I have the time - and once it cools
down a little around here - and let you know what happens; as before though,
I cannot promise when this will be. Am also going to look into an
alternative approach where paragraphs are extracted from documents and
merged to form a new document - could be tricky but might work.

Yours

Mark B


MSB wrote:
> 
> Hello Justin,
> 
> Not to hand, no I do not. Having said that I am quite willing to try and
> put something together but cannot promise a time scale, sorry. If I have
> any time today, I will look into writing something. Can I just ask how you
> want to perform the merge? Do you want to simply copy text from one
> document into an existing document or do you want to take some text from
> two or more documents and merge that into a new document?
> 
> Thinking a little bit more overnight, the answer to merging documents
> ought to have been 'yes but with a caveat'; fonts could be an issue but I
> am not at all sure about this and it would require testing. I am thinking
> here about a document that could have been created on another machine
> entirely and then emailed to you; if it uses an obscure font then we could
> face a problem however, this is hard to prove until some testing is
> undertaken.
> 
> Yours
> 
> Mark B
> 
> 
> Beltran, Justin wrote:
>> 
>> Hi Mark,
>> 
>> Do you have an examples of how to merge different word documents?  I've
>> seen code to parse a word doc, but not how to merge different documents.
>> 
>> Justin
>> 
>> -----Original Message-----
>> From: MSB [mailto:markbrdsly@tiscali.co.uk] 
>> Sent: Tuesday, June 30, 2009 11:56 PM
>> To: user@poi.apache.org
>> Subject: Re: Use cases for MS Word files
>> 
>> 
>> Morning Justin,
>> 
>> I think that the answers to your questions are yes, yes, no and no in
>> that
>> order. Do not take this as the final answer however as I have not used
>> HWPF\XSSF for a while now and the project could have advanced since that
>> time.
>> 
>> As for other open source APIs, there is not another one that I am aware
>> of
>> which targets both the binary and OPenXML file formats. There is the 
>> OpenXML4j project at Sourceforge
>> (http://sourceforge.net/projects/openxml4j/) but this is 'limited' to
>> just
>> the XML based file format. Also, I have not used that tool so cannot
>> speak
>> to it's feature set, sorry. Of course, there are commercial tools -
>> Aspose
>> is the one that springs to mind.
>> 
>> While OLE might have been an option if you were targetting just Windows
>> platforms. OpenOffice could offer you an alternative. It is open source
>> and
>> platform independent but quite large to deploy. UNO is not an easy
>> technique/interface to learn and I do not have complete confidence in
>> OpenOffice's abilities to accurately render complex documents; at least
>> in
>> the binary (OLE2CDF) file format. Further, applications that use it can
>> be
>> quite slow because you will actually be manipulating an instance of the
>> application rather than creating a file. Finally, there are complications
>> if
>> you want to run it in a client server configuration as you will need to
>> create what is termed a 'connection aware' client at the very least.
>> 
>> If you have the time, it might be worth seeing what would be required to
>> add
>> the necessary capabilities into HWPF\XWPF. I am certain there are others
>> who
>> would like to see this sort of functionality and would be delighted if
>> you
>> could join the development team and contribute patches.
>> 
>> Yours
>> 
>> Mark B
>> 
>> 
>> Beltran, Justin wrote:
>>> 
>>> Hi all,
>>> 
>>> I'm doing initial research on a project and I'm trying to see what how
>>> mature the capabilities are in POI in regards to the following:
>>> 
>>> 
>>> 1.       Parsing text in documents (i.e. in paragraphs, tables, etc.)
>>> 
>>> 2.       Merging different word documents
>>> 
>>> 3.       Creating hyperlinks (not to external URLs, but to other places
>>> in
>>> document)
>>> 
>>> 4.       Creating table of contents
>>> 
>>> If POI currently doesn't have these capabilities, are there any other
>>> open
>>> source Java packages that can deliver the same functionality?  Thanks in
>>> advance!
>>> 
>>> Justin
>>> 
>>> 
>>> 
>>> 
>>> This e-mail is intended solely for the person or entity to which it is
>>> addressed
>>> and may contain confidential and/or privileged information. Any review,
>>> dissemination,
>>> copying, printing or other use of this e-mail by persons or entities
>>> other
>>> than the 
>>> addressee is prohibited. If you have received this e-mail in error,
>>> please
>>> contact
>>> the sender immediately and delete the material from any computer.
>>> To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com 
>>> Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas,
>>> Texas 75201
>>> 
>>> 
>>> 
>> 
>> -- 
>> View this message in context:
>> http://www.nabble.com/Use-cases-for-MS-Word-files-tp24281577p24285074.html
>> Sent from the POI - User mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>> 
>> This e-mail is intended solely for the person or entity to which it is
>> addressed
>> and may contain confidential and/or privileged information. Any review,
>> dissemination,
>> copying, printing or other use of this e-mail by persons or entities
>> other than the 
>> addressee is prohibited. If you have received this e-mail in error,
>> please contact
>> the sender immediately and delete the material from any computer.
>> To unsubscribe send an email to: Unsubscribe@hitachiconsulting.com 
>> Hitachi Consulting Corporation, 2001 Bryan Street, Suite 3600, Dallas,
>> Texas 75201
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Use-cases-for-MS-Word-files-tp24281577p24309490.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org