poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MSB <markbrd...@tiscali.co.uk>
Subject Re: How to compare 2 word doc (OLE2CDF or OpenXML).
Date Tue, 28 Jul 2009 15:29:12 GMT

Have managed to add the method that saves the results away to a file now but
I have had to make one or two hacks to coerce the code to work properly. The
first of these relates to the colour of the text and I was forced into it by
HWPF's inability to tell me what the index for a specific colour is. If you
look into the ComparisonResult class, you will see a ststic method right at
the bottom of the the file called getColour(). It takes as a parameter the
type of the comparison result and returns a primitive int based upon this
value. I ran a few tests using the template file and found out that 2 was
the index for dark blue, 11 the index for dark green, 6 the index for red
and 5 the index for purple. This is far from ideal but it does seem to work
and it may even do so for your requirement - I guess it all comes down to
how you will be running the code. I would assume that as long as the code is
run on the same computer or uses the same template file, then this technique
will be safe and reliable.

The second assumption I made is that the results should be sorted so that
details of all the inserted paragraphs are printed following by those of all
the deleted, modified and moved paragraphs. If you do not want this, simply
comment out this line;

java.util.Collections.sort(comparisonResults);

in tha saveResults() method of the DocumentComparator class.

The third assumption was that even empty paragraphs are important. They may
not be the case and there is a simple way to prevent them being loaded.
Simply modify this line in the loadDocument() method;

docParts.add(new DocumentPart(para));

You will need to add a check to see whether the paragraph contains text, if
it does than add it to the docParts ArrayList.

The code still does not do everything you want; tables are completely
ignored, modified paragraphs are not yet checked and there is not even a nod
in the direction of pictures but it may be enough to prove that the approach
could work. Will not do anymore work on the code now to give you the chance
to have a good look through it and ensure it accomplishes something of what
you are after.

Yours

Mark B

package comparedocuments;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.ArrayList;
import java.io.FileNotFoundException;
import java.io.IOException;

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.ParagraphProperties;
import org.apache.poi.hwpf.usermodel.CharacterRun;

/**
 * An instance of this calss can be used to perform a comparison between two
 * binary (OLE2CDF) Microsoft Word documents.
 *
 * @author Mark B
 * @version 1.00 27th July 2009
 */
public class DocumentComparator {
    
    /**
     * Called to compare the two documents and output the results of the
     * comparison to a third Microsoft Word document.
     * 
     * @param originalDoc The path to and name of the original document, the
     *                    document that is the basis for the comparison.
     * @param compareToDoc The path to and name of the document that should
     *                     be compared with the original for any
modifications.
     * @param resultDocName The path to and name of the document that should
     *                      contain the results of the comparison process.
     * @param docTemplate The path to and name of the empty Word document
that
     *                    should be used as the basis for the rusults
document.
     * @throws java.io.IOException Thrown to signal that some sort of I/O
     *                             Exception has occurred.
     * @throws java.io.FileNotFoundException Thrown to signal that a file
     *                                       could not be located.
     */
    public void compareDocuments(String originalDoc, String compareToDoc,
                                 String resultDocName, String docTemplate)
                                 throws IOException, FileNotFoundException {
        ArrayList<DocumentPart> originalDocParts =
this.loadDocument(originalDoc);
        ArrayList<DocumentPart> compareToDocParts =
this.loadDocument(compareToDoc);
        ArrayList<ComparisonResult> comparisonResults =
this.performComparison(
                originalDocParts, compareToDocParts);
        if(!(comparisonResults.isEmpty())) {
            this.saveResults(comparisonResults, docTemplate, resultDocName);
        }
    }
    
    /**
     * Opens a named binary (OLE2CDF) Microsoft Word document and converts
that
     * documents contents into an ArrayList of instances of the DocumentPart
     * class.
     * @param docName The path to and name of a Microsoft Word document
file.
     * @return An instance of the ArrayList class encapsulating instances
     *         of the DocumentPart class. Each DocumentPart will encapsulate
     *         information about a paragraph of text or a table recovered
from
     *         the Microsoft Word document.
     * @throws java.io.IOException If an I/O Exception occurs
     * @throws java.io.FileNotFoundException Thrown to indicate that the
     *                                       named Microsoft Word file could
     *                                       not be located.
     */
    private ArrayList<DocumentPart> loadDocument(String docName)
                                     throws IOException,
FileNotFoundException {
        File file = null;
        FileInputStream fis = null;
        HWPFDocument document = null;
        Range overallRange = null;
        Paragraph para = null;
        int numParas = 0;
        boolean inTable = false;
        ArrayList<DocumentPart> docParts = null;
        try {
            // Open the Word file.
            file = new File(docName);
            fis = new FileInputStream(file);
            document = new HWPFDocument(fis);
            // Get the overall Range for the document and the number
            // of paragraphs from this Range.
            overallRange = document.getOverallRange();
            numParas = overallRange.numParagraphs();
            docParts = new ArrayList<DocumentPart>(numParas);
            for(int i = 0; i < numParas; i++) {
                para = overallRange.getParagraph(i);
                // Is the paragraph 'in' a table? If so, it is possible to
                // recover a reference to that Table from the first
paragraph
                // only. If calls are made to the getTable() method using
                // subsequent paragraphs then an exception will be thrown.
So,
                // after getting the Table, a flag is set to prevent further
                // calls to the getTable() method.
                if(para.isInTable()) {
                    if(!inTable) {
                        // Get a reference to the Table and pass it to the
                        // constructor of the DocumentPart class. Add the
                        // DocumentPart instance to the ArrayLlist.
                        docParts.add(new DocumentPart(
                                overallRange.getTable(para)));
                        inTable = true;
                    }
                }
                // The paragraph is not in a table so simply add a new
instance
                // to the ArrayList that encapsulates the paragraph of text.
                else {
                    docParts.add(new DocumentPart(para));
                    inTable = false;
                }
            }
            return(docParts);
        }
        finally {
            if(fis != null) {
                try {
                  fis.close();  
                }
                catch(IOException ioEx) {
                    // I G N O R E
                }
            }
        }
    }

    /**
     * Performs the comparison of the parts of the two documents, currently
     * it checks for paragraphs that have been moved, inserted and deleted.
     * Note that the actual comparisons are deferred to methods that are
     * optimised to deal with tables and paragraphs.
     *
     * @param originalDocParts
     * @param compareToDocParts
     * @return
     */
    private ArrayList<ComparisonResult> performComparison(
                            ArrayList<DocumentPart> originalDocParts,
                            ArrayList<DocumentPart> compareToDocParts) {
        DocumentPart originalDocPart = null;
        ArrayList<ComparisonResult> comparisonResults = new
ArrayList<ComparisonResult>();
        // Note the 'older' type of for loop. This is used as the index
        // number of the element is required.
        for(int i = 0; i < originalDocParts.size(); i++) {
            originalDocPart = originalDocParts.get(i);
            if(originalDocPart.isTable()) {
                this.compareTables(originalDocPart, i,
                                   compareToDocParts, comparisonResults);
            }
            else {
                this.compareParagraphs(originalDocPart, i,
                                       compareToDocParts,
comparisonResults);
            }
        }
        // If there are any un-matched elements in the compareToDocParts
        // ArrayList, then they MUST refer to newly inserted paragraphs.
        for(DocumentPart compareToDocPart : compareToDocParts) {
            if(!(compareToDocPart.isMatched())) {
                comparisonResults.add(new ComparisonResult(
                        compareToDocPart.getRangeObject(),
                        ComparisonResult.INSERTED));
            }
        }
        return(comparisonResults);
    }

    /**
     * Still to be coded, this method will actually perform the comparison
     * between tables contained within the two documents. Currently, the
list
     * of checks includes;
     *      The number of Rows.
     *      The number of columns.
     *      The contents of the cells.
     * There are complications with tables, chief amongst which is
determining
     * if a an existing table has been deleted and replaced with a different
     * table or if that existing table was simply modified. To check for the
     * latter, I am currently thinking along the lines of searching for a
     * sub-table inside a larger table that contains ALL of the values from
the
     * original smaller table and in their correct orientations. If this is
the
     * case then the table has been modified. If not it has been deleted and
a
     * new table inserted.
     *
     * @param documentPart An instance of the DocumentPart class that
     *                     encapsulates one Table recovered from the
     *                     original Word document.
     * @param index A primitive int value that contains the index within the
     *              containing ArrayList of the original document's Table.
     * @param compareToDocParts An ArrayList containing DocumentPart objects
     *                          that together encapsulate all of the 'parts'
     *                          recovered from the modified Word document.
     * @param comparisonResults An ArrayList that will contain details of
     *                          any and all document part that have changed
     *                          between the two versions of the Word
document;
     *                          between the original and the modified.
     */
    private void compareTables(DocumentPart docPart, int index,
                               ArrayList<DocumentPart> compareToDocParts,
                               ArrayList<ComparisonResult>
comparisionResults) {
        // TO DO: Code the table comparision
    }

    /**
     * Performs the actual comparisons of the Paragraph objects. Note that
     * the method does not actually return a value and that the
     * comparisonResults paramater is - to borrow terms from another laguage
-
     * an in-out parameter. It is pased into the method and may be updated
by
     * the method's code.
     *
     * Note: this method still does not search for modified paragraphs, only
     * those that have been inserted, deleted or moved.
     *
     * @param documentPart An instance of the DocumentPart class that
     *                     encapsulates one Paragraph recovered from the
     *                     original Word document.
     * @param index A primitive int value that contains the index within the
     *              containing ArrayList of the original document's
Paragraph.
     * @param compareToDocParts An ArrayList containing DocumentPart objects
     *                          that together encapsulate all of the 'parts'
     *                          recovered from the modified Word document.
     * @param comparisonResults An ArrayList that will contain details of
     *                          any and all document part that have changed
     *                          between the two versions of the Word
document;
     *                          between the original and the modified.
     */
    private void compareParagraphs(DocumentPart documentPart, int index,
                                   ArrayList<DocumentPart>
compareToDocParts,
                                   ArrayList<ComparisonResult>
comparisonResults) {
        // Get the paragraph from the same location in the compareToDocParts
        // ArrayList. Note that if a paragraph has been deleted from the
compare
        // to document then the index value could be out of range. If this
is the
        // case, the best default is to start searching from the first
element
        // in the array.
        if(index >= compareToDocParts.size()) {
            index = 0;
        }
        DocumentPart comparisonPart = compareToDocParts.get(index);
        ComparisonResult comparisonResult = null;
        // If the corresponding part is a table or if it has been matched
        // or if the texts differ then the code shoudl search for a match.
        // Atr this point, there are only two options available; the
matching
        // paragraph has been deleted or moved.
        if(comparisonPart.isTable() ||
           comparisonPart.isMatched() ||
          
!(comparisonPart.getParagraphText().equals(documentPart.getParagraphText())))
{
            // Assume the matching part has been deleted, we can change this
            // later if a match is found.
            comparisonResult = new ComparisonResult(
                    documentPart.getRangeObject(),
ComparisonResult.DELETED);
            // Step through the ArrayList of compare to parts and examine
each
            // in turn. If we get to the end of the list and do not find a
match
            // then the paragraph will have been deleted.
            for(int i = 0; i < compareToDocParts.size(); i++) {
                comparisonPart = compareToDocParts.get(i);
                // However, if we find a compare to part that is NOT a
table,
                // that has NOT already been matched and whose text matches
                // that of the original part, then the paragraph has simply
                // been moved.
                if(!(comparisonPart.isTable()) &&
                   !(comparisonPart.isMatched()) &&
                  
comparisonPart.getParagraphText().equals(documentPart.getParagraphText())) {
                    // Set the comparison result correctly. Set the
comparison
                    // status of the compare to part to ensure it is not
checked
                    // again and terminate the for loop.
                   
comparisonResult.setComparisonResult(ComparisonResult.MOVED);
                    comparisonPart.setComparisonStatus(true);
                    i = compareToDocParts.size();
                }
            }
            comparisonResults.add(comparisonResult);
        }
        // If the original two parts match, set the comparison status of the
        // compare to part to ensure it is not checked again.
        else {
            comparisonPart.setComparisonStatus(true);
        }

    }

    /**
     * To be coded. This method will create the document that details the
     * classes 'findings', i.e. it will list those paragraphs that have
     * been inserted, deleted or moved - currently.
     *
     * @param comparisonResults An ArrayList of type ComparisonResult. Each
     *                          element encapsulates information about a
     *                          Paragraph or Table that changed between the
two
     *                          document versions.
     * @param docTemplate HWPF cannot create a new, empty Word document. It
     *                    requires that a template be used to build the file
     *                    around and this parameter encapsulates the path to
     *                    and name of that template file.
     * @param resultDocName The name of and path to the file that
will/should
     *                      contains the results of the comparision process.
     * @throws java.io.IOException If an I/O Exception occurs
     * @throws java.io.FileNotFoundException Thrown to indicate that the
     *                                       named Microsoft Word file could
     *                                       not be located.
     */
    private void saveResults(ArrayList<ComparisonResult> comparisonResults,
                             String docTemplate, String resultDocName)
                                     throws IOException,
FileNotFoundException {
        File inputFile = null;
        File outputFile = null;
        FileInputStream fis = null;
        FileOutputStream fos = null;
        HWPFDocument document = null;
        Range range = null;
        Paragraph para = null;
        CharacterRun charRun = null;
        ParagraphProperties paraProps = null;
        String text = null;
        try {
            // Get the template file that the results document will
            // be built around.
            inputFile = new File(docTemplate);
            fis = new FileInputStream(inputFile);
            document = new HWPFDocument(fis);
            fis.close();
            fis = null;
            range = document.getOverallRange();
            paraProps = new ParagraphProperties();
            paraProps.setSpacingAfter(50);
            // Sort the comparison results. You may not wish to do this but
it
            // will ensure that the results document lists all of the
inserted
            // paragraphs followed by all of the deleted ones, then the
modified
            // paragraphs and finally the moved paragraphs.
            java.util.Collections.sort(comparisonResults);
            // Step through the comparison results and write each away to
            // the results document. Note the explicit addition of the
            // '\r' character to the end of the text. I do not know why this
            // is necessary, I am just following the instruction in the
javadoc
            for(ComparisonResult compResult : comparisonResults) {
                para = range.insertAfter(paraProps, 0);
                text = compResult.getParagraphText();
                text += "\r";
                charRun = para.insertAfter(text);
                // The colour is a guess. There does not appear to be any
way to
                // discover the indices for each colour so it may be that
some
                // experiementation has to take place to discover which
                // colour maps to which index value.
                charRun.setColor(ComparisonResult.getColour(
                        compResult.getComparisonResult()));
                range = charRun;
            }
            outputFile = new File(resultDocName);
            fos = new FileOutputStream(outputFile);
            document.write(fos);
        }
        finally {
            if(fis != null) {
                try {
                    fis.close();
                    fis = null;
                }
                catch(IOException ioEx) {
                    // I G N O R E
                }
            }
            if(fos != null) {
                try {
                    fos.close();
                    fos = null;
                }
                catch(IOException ioEx) {
                    // I G N O R E
                }
            }
        }
    }

    /**
     * Main entry point to the program.
     *
     * @param args
     */
    public static void main(String[] args) {
        try {
            // Remember that the Empty Doc.doc is an empty Word
            // document file. Just open Word, create a new document
            // and then save it away to file.
            DocumentComparator docComp = new DocumentComparator();
            docComp.compareDocuments("C:/temp/Compare Doc Original.doc",
                                     "C:/temp/Compare Doc Modified.doc",
                                     "C:/temp/Comparison Results.doc",
                                     "C:/temp/Empty Doc.doc");
        }
        catch(FileNotFoundException fnfEx) {
            System.out.println("Caught a FileNotFoundException.");
            System.out.println("Message: " + fnfEx.getMessage());
            System.out.println("Stacktrace follows..............");
            fnfEx.printStackTrace(System.out);
        }
        catch(IOException ioEx) {
            System.out.println("Caught a IOException.");
            System.out.println("Message: " + ioEx.getMessage());
            System.out.println("Stacktrace follows..............");
            ioEx.printStackTrace(System.out);
        }
    }
}

package comparedocuments;

import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.Table;
import org.apache.poi.hwpf.usermodel.TableRow;

/**
 * Encapsulates a 'part' of a Microsoft Word document. Currently, that part
can
 * either be a Table or a paragraph of text.
 *
 * @author Mark B
 * @version 1.00 27th July 2009.
 */
public class DocumentPart {

    private Range docPart = null;
    private boolean comparisonStatus = false;

    /**
     * Create a new instance of the DocumentPart class using the following
     * paramater.
     *
     * @param docPart An instance of the org.apache.poi.hwpf.usermodel.Range
     *                class that will encapsulate an instance of the
     *                org.apache.poi.hwpf.usermodel.Paragraph or an instance
     *                of the org.apache.poi.hwpf.usermodel.Table class.
     */
    public DocumentPart(Range docPart) {
        this.docPart = docPart;
        // Note that as the part has not been successfully compared to
another
        // part the status is false.
        this.comparisonStatus = false;
    }

    /**
     * Has a match been foound for this document part?
     *
     * @return A boolean value that indicates whether a match was found
between
     *         two document parts.
     */
    public boolean isMatched() {
        return(this.comparisonStatus);
    }

    /**
     * Once a paragraph in the modified document has been found to match one
in
     * the original document, this method should be called to set the
instances
     * comparison status to true so that it will not again be used in a
     * comparison operation.
     *
     * @param boolValue A primitive boolean value that will set the
instances
     *                  comparison status.
     */
    public void setComparisonStatus(boolean boolValue) {
        this.comparisonStatus = boolValue;
    }

    /**
     * Does a DocumentPart encapsulate a table?
     * @return A primitive boolean value; true if the DocumentPart
encapsulates
     *         a Table, false otherwise.
     */
    public boolean isTable() {
        return(this.docPart instanceof Table);
    }

    /**
     * If the DocumentPart encapsulates a Table, get the number of rows in
the
     * rable.
     *
     * @return A primitive int whose value indicates how many rows there are
in
     *         the table.
     * @throws java.lang.UnsupportedOperationException Thrown if this method
is
     *         called for a DocumentPart instance that encapsulates a
Paragraph.
     */
    public int getNumRows() throws UnsupportedOperationException {
        int numRows = 0;
        if(this.isTable()) {
            Table table = (Table)this.docPart;
            numRows = table.numRows();
        }
        else {
            throw new UnsupportedOperationException("The DocumentPart does "
+
                    "not encapsulate a Table.");
        }
        return(numRows);
    }

    /**
     * How many columns are there in the Table. This method assumes that the
     * table is 'square', i.e. that each row of the Table holds the same
number
     * of columns.
     *
     * @return A primitive int whose value indicates how many columns there
are
     *         in the Table.
     * @throws java.lang.UnsupportedOperationException Thrown if this method
is
     *         called for a DocumentPart instance that encapsulates a
Paragraph.
     */
    public int getNumColumns() throws UnsupportedOperationException {
        return(this.getNumColumns(0));
    }

    /**
     * How many columns are there in a specific row of the Table?
     *
     * @return A primitive int whose value indicates how many columns there
are
     *         in the Table row.
     * @throws java.lang.UnsupportedOperationException Thrown if this method
is
     *         called for a DocumentPart instance that encapsulates a
Paragraph.
     */
    public int getNumColumns(int rowNum) throws
UnsupportedOperationException {
        int numColumns = 0;
        if(this.isTable()) {
            Table table = (Table)this.docPart;
            TableRow row = table.getRow(rowNum);
            numColumns = row.numCells();
        }
        else {
            throw new UnsupportedOperationException("The DocumentPart does "
+
                    "not encapsulate a Table.");
        }
        return(numColumns);
    }

    /**
     * Return the contents of a specific cell. Still to be coded fully.
     *
     * @param rowNum A primitive int that indicates the row the cell is on.
     *               Remember that row indices are zero based.
     * @param colNum A primitive int that indicates the column the cell is
in.
     *               Remember that column indices are zero based.
     * @return An instance of the String class that encapsulates the cells
     *         contents
     * @throws java.lang.UnsupportedOperationException Thrown if this method
is
     *         called for a DocumentPart instance that encapsulates a
Paragraph.
     */
    public String getCellContents(int rowNum, int colNum)
                                          throws
UnsupportedOperationException {
        return("");
    }

    /**
     * Return the text of the Paragraph.
     *
     * @return An instance of the String class that encapsulates the text
     *         the Paragraph contained. Note that this will be stripped of
     *         all fields.
     * @throws java.lang.UnsupportedOperationException Thrown if this method
is
     *         called for a DocumentPart instance that encapsulates a Table.
     */
    public String getParagraphText() throws UnsupportedOperationException {
        String returnValue = null;
        if(!this.isTable()) {
            Paragraph para = (Paragraph)this.docPart;
            returnValue = Range.stripFields(para.text());
        }
        else {
            throw new IllegalStateException("The DocumentPart does not " +
                    "encapsulate a Paragraph.");
        }
        return(returnValue);
    }

    /**
     * Get a reference to the Range object - the Table or Paragraph - that a
     * specific DocumentPart instance encapsulates.
     *
     * @return A reference to the encapsulated Range object.
     */
    public Range getRangeObject() {
        return(this.docPart);
    }

    /**
     * Get a String representation of the objects state.
     *
     * @return An instance of the String class that encapsulates information
     *         describing the current state of the object.
     */
    public String toString() {
        StringBuffer buffer = new StringBuffer();
        buffer.append("DocumentPart.\n");
        if(this.isTable()) {
            buffer.append("This document part is a Table.\n");
            int numRows = this.getNumRows();
            int numCells = 0;
            buffer.append("Number of rows: " + numRows + "\n");
            for(int i = 0; i < numRows; i++) {
                numCells = this.getNumColumns(i);
                buffer.append("Number of cells in row " +
                              i +
                              ": " +
                              numCells +
                              "\n");
                for(int j = 0; j < numCells; j++) {
                    buffer.append("Cell " +
                                  j +
                                  " in row " +
                                  i +
                                  " contains: " +
                                  this.getCellContents(i, j) +
                                  "\n");
                }
            }
        }
        else {
            buffer.append("This document part is a Paragraph.\n");
            buffer.append("Text: \n\"");
            buffer.append(this.getParagraphText() + "\"");
        }
        return(buffer.toString());
    }
}

package comparedocuments;

import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.Table;

/**
 * An instance of this class encapsulates information about changes that
have
 * occurred between versions of a Word document. Currently, it is restricted
to
 * tracking changes to paragraphs only and the changes that can be described
are
 * the insertion of a new paragraph, the deletion of an existing paragraph,
the
 * modification of an existing paragraph and the relocation of an existing
 * paragraph to a new location in the document.
 *
 * @author Mark B
 * @version 1.00 28th July 2009
 */
public class ComparisonResult implements Comparable<ComparisonResult> {

    private Range docPart = null;
    private int comparisonResult = 0;

    public static final int INSERTED = 0;
    public static final int DELETED = 1;
    public static final int MODIFIED = 2;
    public static final int MOVED = 3;

    /**
     * Create a new instance of the ComparisonResult class using the
     * following parameters
     *
     * @param docPart A Range object that encapsulates the document 'part'
     *                recovered from the Word document; this can be either a
     *                Table or Paragraph but note that no checks are made
     *                on the objects type thought they ought to be.
     * @param comparisonResult A primitive int that describes the type or
     *                         result of the comparison. Constants are
     *                         provided to support this but no checks are
     *                         made on the value passed.
     */
    public ComparisonResult(Range docPart, int comparisonResult) {
        this.docPart = docPart;
        this.comparisonResult = comparisonResult;
    }

    /**
     * Get the result or type of the comparison.
     *
     * @return A primitive int value that indicates the result of comparing
     *         this document part to others. The following constants have
been
     *         declared;
     *             ComparisonResult.INSERTED
     *             ComparisonResult.DELETED
     *             ComparisonResult.MODIFIED
     *             ComparisonResult.MOVED
     *
     */
    public int getComparisonResult() {
        return(this.comparisonResult);
    }

    /**
     * Sets the result or type of the comparison. Note that no checks are
     * made on the value passed to this method; they OUGHT to be.
     *
     * @param comparisonResult A primitive int that describes the result or
     *                         type of the comparison. Constants are
provided
     *                         to support this process;
     *                             ComparisonResult.INSERTED
     *                             ComparisonResult.DELETED
     *                             ComparisonResult.MODIFIED
     *                             ComparisonResult.MOVED
     */
    public void setComparisonResult(int comparisonResult) {
        this.comparisonResult = comparisonResult;
    }

    /**
     * Does this instance encapsulate an
org.apache.poi.hwpf.usermodel.Table.
     *
     * @return The boolean value 'true' will be returned if the instance
     *         encapsulates a Table, false if a Paragraph.
     */
    public boolean isTable() {
        return(this.docPart instanceof Table);
    }

    /**
     * Return the text of the Paragraph.
     *
     * @return An instance of the String class that encapsulates the text
     *         the Paragraph contained. Note that this will be stripped of
     *         all fields.
     * @throws java.lang.UnsupportedOperationException Thrown if this method
is
     *         called for a DocumentPart instance that encapsulates a Table.
     */
    public String getParagraphText() throws UnsupportedOperationException {
        String returnValue = null;
        if(!this.isTable()) {
            Paragraph para = (Paragraph)this.docPart;
            returnValue = Range.stripFields(para.text());
        }
        else {
            throw new IllegalStateException("The DocumentPart does not " +
                    "encapsulate a Paragraph.");
        }
        return(returnValue);
    }

    /**
     * Returns a String representation of the objects internal state.
     *
     * @return A String that describes the objects state.
     */
    public String toString() {
        // Not at all good code but it will suffice for testing.
        StringBuffer buffer = new StringBuffer();
        buffer.append("ComparisonResult.\n");
        if(this.isTable()) {
            buffer.append("For a Table that has been modified.\n");
            switch(this.getComparisonResult()) {
                case INSERTED:
                    buffer.append("It has been inserted.\n");
                    break;
                case DELETED:
                    buffer.append("It has been deleted.\n");
                    break;
                case MODIFIED:
                    buffer.append("It has been modified.\n");
                    break;
                case MOVED:
                    buffer.append("It has been moved.\n");
                    break;
            }
         }
        else {
            buffer.append("For a Paragraph that has been modified.");
            switch(this.getComparisonResult()) {
                case INSERTED:
                    buffer.append("It has been inserted.\n");
                    break;
                case DELETED:
                    buffer.append("It has been deleted.\n");
                    break;
                case MODIFIED:
                    buffer.append("It has been modified.\n");
                    break;
                case MOVED:
                    buffer.append("It has been moved.\n");
                    break;
            }
            buffer.append("Text: \n\"");
            buffer.append(this.getParagraphText() + "\"");
        }
        return(buffer.toString());
    }

    /**
     * Compares the current object (this) with the specified object to
determine
     * an ordering.
     *
     * @param compResult The ComparisonResult object that is to be compared.
     * @return A negative integer if this object is less than, zero if it is
     *         equal to, or a positive integer if it is greater than the
     *         specified object.
     */
    public int compareTo(ComparisonResult compResult) {
        return(this.getComparisonResult() -
compResult.getComparisonResult());
    }

    /**
     * A hack method that attempts to cover for HWPF's inability to provide
     * colour information. I have found outh that for the specific template
     * file I am using 2 is the colour index for blue, 11 the index for
green,
     * 6 the index for red, 5 the index for purple and 1 one of the indices
for
     * black.
     *
     * @param comparisonResult A primitive int whose value indicates whether
     *                         the result of the comparison was that a
paragraph
     *                         had been inserteed, deleted, modified or
moved.
     * @return A primitive int whose value is the index to a colour in the
     *         document colour table.
     */
    public static int getColour(int comparisonResult) {
        int returnValue = 0;
        switch(comparisonResult) {
            case ComparisonResult.INSERTED:
                // Blue
                returnValue = 2;
                break;
            case ComparisonResult.DELETED:
                // Green
                returnValue = 11;
                break;
            case ComparisonResult.MODIFIED:
                // Red
                returnValue = 6;
                break;
            case ComparisonResult.MOVED:
                // Purple
                returnValue = 5;
                break;
                // Black
            default:
                returnValue = 1;
        }
        return(returnValue);
    }
}


MSB wrote:
> 
> Well, in that case, this may be well timed. Made a bit more progress today
> - even though I should have been in the workshop(!!) - and thought that
> you would be interested, so please find the code at the bottom of this
> email.
> 
> As you will be able to see, I have re-factored the DocumentPart class and
> addaed a ComparisonResult class. Currently, I have coded a comparison
> method that works for Paragraphs and checks to see if a paragraph has been
> deleted, moved or inserted. At the moment, I have dodged the issue of
> determing whether a paragraph has been modified and have not put in the
> support for tables as there is a similar issue there as well.
> 
> Either way, have a look at how the code is progressing and see if it
> 'works' for you. Also, test it throughly with a few documents; my tests
> indicate that it is going to be quite sensitive - for example, when a
> paragraph is deleted it flags this and also notes that the paragraphs
> below it have been moved!
> 
> I think it should be possible to put together something quick and dirty
> for a demo and would advise that you test the code to see if it works fro
> your documents and then possibly concentrate on producing some sort of
> output document. For a demo, it may be wise to set aside the issues
> surrounding modified paragraphs and tables and ensure that it can detect
> deleted, moved or inserted paragraphs and then produce some output.
> 
> Yours
> 
> Mark B
> 
> package comparedocuments;
> 
> import java.io.File;
> import java.io.FileInputStream;
> import java.util.ArrayList;
> import java.io.FileNotFoundException;
> import java.io.IOException;
> 
> import org.apache.poi.hwpf.HWPFDocument;
> import org.apache.poi.hwpf.usermodel.Range;
> import org.apache.poi.hwpf.usermodel.Paragraph;
> 
> /**
>  * An instance of this calss can be used to perform a comparison between
> two
>  * binary (OLE2CDF) Microsoft Word documents.
>  *
>  * @author Mark B
>  * @version 1.00 27th July 2009
>  */
> public class DocumentComparator {
>     
>     /**
>      * Called to compare the two documents and output the results of the
>      * comparison to a third Microsoft Word document.
>      * 
>      * @param originalDoc The path to and name of the original document,
> the
>      *                    document that is the basis for the comparison.
>      * @param compareToDoc The path to and name of the document that
> should
>      *                     be compared with the original for any
> modifications.
>      * @param resultDocName The path to and name of the document that
> should
>      *                      contain the results of the comparison process.
>      * @param docTemplate The path to and name of the empty Word document
> that
>      *                    should be used as the basis for the rusults
> document.
>      * @throws java.io.IOException Thrown to signal that some sort of I/O
>      *                             Exception has occurred.
>      * @throws java.io.FileNotFoundException Thrown to signal that a file
>      *                                       could not be located.
>      */
>     public void compareDocuments(String originalDoc, String compareToDoc,
>                                  String resultDocName, String docTemplate)
>                                  throws IOException, FileNotFoundException
> {
>         ArrayList<DocumentPart> originalDocParts =
> this.loadDocument(originalDoc);
>         ArrayList<DocumentPart> compareToDocParts =
> this.loadDocument(compareToDoc);
>         ArrayList<ComparisonResult> comparisonResults =
> this.performComparison(
>                 originalDocParts, compareToDocParts);
>         if(!comparisonResults.isEmpty()) {
>             for(ComparisonResult compResult : comparisonResults) {
>                 System.out.println(compResult.toString());
>             }
>         }
>         //this.saveResults(comparisionResults, docTemplate,
> resultDocName);
>     }
>     
>     /**
>      * Opens a named binary (OLE2CDF) Microsoft Word document and converts
> that
>      * documents contents into an ArrayList of instances of the
> DocumentPart
>      * class.
>      * @param docName The path to and name of a Microsoft Word document
> file.
>      * @return An instance of the ArrayList class encapsulating instances
>      *         of the DocumentPart class. Each DocumentPart will
> encapsulate
>      *         information about a paragraph of text or a table recovered
> from
>      *         the Microsoft Word document.
>      * @throws java.io.IOException If an I/O Exception occurs
>      * @throws java.io.FileNotFoundException Thrown to indicate that the
>      *                                       named Microsoft Word file
> could
>      *                                       not be located.
>      */
>     private ArrayList<DocumentPart> loadDocument(String docName)
>                                      throws IOException,
> FileNotFoundException {
>         File file = null;
>         FileInputStream fis = null;
>         HWPFDocument document = null;
>         Range overallRange = null;
>         Paragraph para = null;
>         int numParas = 0;
>         boolean inTable = false;
>         ArrayList<DocumentPart> docParts = null;
>         try {
>             // Open the Word file.
>             file = new File(docName);
>             fis = new FileInputStream(file);
>             document = new HWPFDocument(fis);
>             // Get the overall Range for the document and the number
>             // of paragraphs from this Range.
>             overallRange = document.getOverallRange();
>             numParas = overallRange.numParagraphs();
>             docParts = new ArrayList<DocumentPart>(numParas);
>             for(int i = 0; i < numParas; i++) {
>                 para = overallRange.getParagraph(i);
>                 // Is the paragraph 'in' a table? If so, it is possible to
>                 // recover a reference to that Table from the first
> paragraph
>                 // only. If calls are made to the getTable() method using
>                 // subsequent paragraphs then an exception will be thrown.
> So,
>                 // after getting the Table, a flag is set to prevent
> further
>                 // calls to the getTable() method.
>                 if(para.isInTable()) {
>                     if(!inTable) {
>                         // Get a reference to the Table and pass it to the
>                         // constructor of the DocumentPart class. Add the
>                         // DocumentPart instance to the ArrayLlist.
>                         docParts.add(new DocumentPart(
>                                 overallRange.getTable(para)));
>                         inTable = true;
>                     }
>                 }
>                 // The paragraph is not in a table so simply add a new
> instance
>                 // to the ArrayList that encapsulates the paragraph of
> text.
>                 else {
>                     docParts.add(new DocumentPart(para));
>                     inTable = false;
>                 }
>             }
>             return(docParts);
>         }
>         finally {
>             if(fis != null) {
>                 try {
>                   fis.close();  
>                 }
>                 catch(IOException ioEx) {
>                     // I G N O R E
>                 }
>             }
>         }
>     }
> 
>     /**
>      * Performs the comparison of the parts of the two documents,
> currently
>      * it checks for paragraphs that have been moved, inserted and
> deleted.
>      * Note that the actual comparisons are deferred to methods that are
>      * optimised to deal with tables and paragraphs.
>      *
>      * @param originalDocParts
>      * @param compareToDocParts
>      * @return
>      */
>     private ArrayList<ComparisonResult> performComparison(
>                             ArrayList<DocumentPart> originalDocParts,
>                             ArrayList<DocumentPart> compareToDocParts) {
>         DocumentPart originalDocPart = null;
>         ArrayList<ComparisonResult> comparisonResults = new
> ArrayList<ComparisonResult>();
>         // Note the 'older' type of for loop. This is used as the index
>         // number of the element is required.
>         for(int i = 0; i < originalDocParts.size(); i++) {
>             originalDocPart = originalDocParts.get(i);
>             if(originalDocPart.isTable()) {
>                 this.compareTables(originalDocPart, i,
>                                    compareToDocParts, comparisonResults);
>             }
>             else {
>                 this.compareParagraphs(originalDocPart, i,
>                                        compareToDocParts,
> comparisonResults);
>             }
>         }
>         // If there are any un-matched elements in the compareToDocParts
>         // ArrayList, then they MUST refer to newly inserted paragraphs.
>         for(DocumentPart compareToDocPart : compareToDocParts) {
>             if(!(compareToDocPart.isMatched())) {
>                 comparisonResults.add(new ComparisonResult(
>                         compareToDocPart.getRangeObject(),
> ComparisonResult.INSERTED));
>             }
>         }
>         return(comparisonResults);
>     }
> 
>     /**
>      * Still to be coded, this method will actually perform the comparison
>      * between tables contained within the two documents. Currently, the
> list
>      * of checks includes;
>      *      The number of Rows.
>      *      The number of columns.
>      *      The contents of the cells.
>      * There are complications with tables, chief amongst which is
> determining
>      * if a an existing table has been deleted and replaced with a
> different
>      * table or if that existing table was simply modified. To check for
> the
>      * latter, I am currently thinking along the lines of searching for a
>      * sub-table inside a larger table that contains ALL of the values
> from the
>      * original smaller table and in their correct orientations. If this
> is the
>      * case then the table has been modified. If not it has been deleted
> and a
>      * new table inserted.
>      *
>      * @param documentPart An instance of the DocumentPart class that
>      *                     encapsulates one Table recovered from the
>      *                     original Word document.
>      * @param index A primitive int value that contains the index within
> the
>      *              containing ArrayList of the original document's Table.
>      * @param compareToDocParts An ArrayList containing DocumentPart
> objects
>      *                          that together encapsulate all of the
> 'parts'
>      *                          recovered from the modified Word document.
>      * @param comparisonResults An ArrayList that will contain details of
>      *                          any and all document part that have
> changed
>      *                          between the two versions of the Word
> document;
>      *                          between the original and the modified.
>      */
>     private void compareTables(DocumentPart docPart, int index,
>                                ArrayList<DocumentPart> compareToDocParts,
>                                ArrayList<ComparisonResult>
> comparisionResults) {
>         // TO DO: Code the table comparision
>     }
> 
>     /**
>      * Performs the actual comparisons of the Paragraph objects. Note that
>      * the method does not actually return a value and that the
>      * comparisonResults paramater is - to borrow terms from another
> laguage -
>      * an in-out parameter. It is pased into the method and may be updated
> by
>      * the method's code.
>      *
>      * Note: this method still does not search for modified paragraphs,
> only
>      * those that have been inserted, deleted or moved.
>      *
>      * @param documentPart An instance of the DocumentPart class that
>      *                     encapsulates one Paragraph recovered from the
>      *                     original Word document.
>      * @param index A primitive int value that contains the index within
> the
>      *              containing ArrayList of the original document's
> Paragraph.
>      * @param compareToDocParts An ArrayList containing DocumentPart
> objects
>      *                          that together encapsulate all of the
> 'parts'
>      *                          recovered from the modified Word document.
>      * @param comparisonResults An ArrayList that will contain details of
>      *                          any and all document part that have
> changed
>      *                          between the two versions of the Word
> document;
>      *                          between the original and the modified.
>      */
>     private void compareParagraphs(DocumentPart documentPart, int index,
>                                    ArrayList<DocumentPart>
> compareToDocParts,
>                                    ArrayList<ComparisonResult>
> comparisonResults) {
>         // Get the paragraph from the same location in the
> compareToDocParts
>         // ArrayList. Note that if a paragraph has been deleted from the
> compare
>         // to document then the index value could be out of range. If this
> is the
>         // case, the best default is to start searching from the first
> element
>         // in the array.
>         if(index >= compareToDocParts.size()) {
>             index = 0;
>         }
>         DocumentPart comparisonPart = compareToDocParts.get(index);
>         ComparisonResult comparisonResult = null;
>         // If the corresponding part is a table or if it has been matched
>         // or if the texts differ then the code shoudl search for a match.
>         // Atr this point, there are only two options available; the
> matching
>         // paragraph has been deleted or moved.
>         if(comparisonPart.isTable() ||
>            comparisonPart.isMatched() ||
>           
> !(comparisonPart.getParagraphText().equals(documentPart.getParagraphText())))
> {
>             // Assume the matching part has been deleted, we can change
> this
>             // later if a match is found.
>             comparisonResult = new ComparisonResult(
>                     documentPart.getRangeObject(),
> ComparisonResult.DELETED);
>             // Step through the ArrayList of compare to parts and examine
> each
>             // in turn. If we get to the end of the list and do not find a
> match
>             // then the paragraph will have been deleted.
>             for(int i = 0; i < compareToDocParts.size(); i++) {
>                 comparisonPart = compareToDocParts.get(i);
>                 // However, if we find a compare to part that is NOT a
> table,
>                 // that has NOT already been matched and whose text
> matches
>                 // that of the original part, then the paragraph has
> simply
>                 // been moved.
>                 if(!(comparisonPart.isTable()) &&
>                    !(comparisonPart.isMatched()) &&
>                   
> comparisonPart.getParagraphText().equals(documentPart.getParagraphText()))
> {
>                     // Set the comparison result correctly. Set the
> comparison
>                     // status of the compare to part to ensure it is not
> checked
>                     // again and terminate the for loop.
>                    
> comparisonResult.setComparisonResult(ComparisonResult.MOVED);
>                     comparisonPart.setComparisonStatus(true);
>                     i = compareToDocParts.size();
>                 }
>             }
>             comparisonResults.add(comparisonResult);
>         }
>         // If the original two parts match, set the comparison status of
> the
>         // compare to part to ensure it is not checked again.
>         else {
>             comparisonPart.setComparisonStatus(true);
>         }
> 
>     }
> 
>     /**
>      * To be coded. This method will create the document that details the
>      * classes 'findings', i.e. it will list those paragraphs that have
>      * been inserted, deleted or moved - currently.
>      *
>      * @param comparisonResults An ArrayList of type ComparisonResult.
> Each
>      *                          element encapsulates information about a
>      *                          Paragraph or Table that changed between
> the two
>      *                          document versions.
>      * @param docTemplate HWPF cannot create a new, empty Word document.
> It
>      *                    requires that a template be used to build the
> file
>      *                    around and this parameter encapsulates the path
> to
>      *                    and name of that template file.
>      * @param resultDocName The name of and path to the file that
> will/should
>      *                      contains the results of the comparision
> process.
>      * @throws java.io.IOException If an I/O Exception occurs
>      * @throws java.io.FileNotFoundException Thrown to indicate that the
>      *                                       named Microsoft Word file
> could
>      *                                       not be located.
>      */
>     private void saveResults(ArrayList<ComparisonResult>
> comparisonResults,
>                              String docTemplate, String resultDocName)
>                                      throws IOException,
> FileNotFoundException {
>         // TO DO: Code saving of results.
>     }
> 
>     /**
>      * Main entry point to the program.
>      *
>      * @param args
>      */
>     public static void main(String[] args) {
>         try {
>             DocumentComparator docComp = new DocumentComparator();
>             docComp.compareDocuments("C:/temp/Compare Doc Original.doc",
>                                      "C:/temp/Compare Doc Modified.doc",
>                                      "results document",
>                                      "results document template");
>         }
>         catch(FileNotFoundException fnfEx) {
>             System.out.println("Caught a FileNotFoundException.");
>             System.out.println("Message: " + fnfEx.getMessage());
>             System.out.println("Stacktrace follows..............");
>             fnfEx.printStackTrace(System.out);
>         }
>         catch(IOException ioEx) {
>             System.out.println("Caught a IOException.");
>             System.out.println("Message: " + ioEx.getMessage());
>             System.out.println("Stacktrace follows..............");
>             ioEx.printStackTrace(System.out);
>         }
>     }
> }
> 
> package comparedocuments;
> 
> import org.apache.poi.hwpf.usermodel.Range;
> import org.apache.poi.hwpf.usermodel.Paragraph;
> import org.apache.poi.hwpf.usermodel.Table;
> import org.apache.poi.hwpf.usermodel.TableRow;
> 
> /**
>  * Encapsulates a 'part' of a Microsoft Word document. Currently, that
> part can
>  * either be a Table or a paragraph of text.
>  *
>  * @author Mark B
>  * @version 1.00 27th July 2009.
>  */
> public class DocumentPart {
> 
>     private Range docPart = null;
>     private boolean comparisonStatus = false;
> 
>     /**
>      * Create a new instance of the DocumentPart class using the following
>      * paramater.
>      *
>      * @param docPart An instance of the
> org.apache.poi.hwpf.usermodel.Range
>      *                class that will encapsulate an instance of the
>      *                org.apache.poi.hwpf.usermodel.Paragraph or an
> instance
>      *                of the org.apache.poi.hwpf.usermodel.Table class.
>      */
>     public DocumentPart(Range docPart) {
>         this.docPart = docPart;
>         // Note that as the part has not been successfully compared to
> another
>         // part the status is false.
>         this.comparisonStatus = false;
>     }
> 
>     /**
>      * Has a match been foound for this document part?
>      *
>      * @return A boolean value that indicates whether a match was found
> between
>      *         two document parts.
>      */
>     public boolean isMatched() {
>         return(this.comparisonStatus);
>     }
> 
>     /**
>      * Once a paragraph in the modified document has been found to match
> one in
>      * the original document, this method should be called to set the
> instances
>      * comparison status to true so that it will not again be used in a
>      * comparison operation.
>      *
>      * @param boolValue A primitive boolean value that will set the
> instances
>      *                  comparison status.
>      */
>     public void setComparisonStatus(boolean boolValue) {
>         this.comparisonStatus = boolValue;
>     }
> 
>     /**
>      * Does a DocumentPart encapsulate a table?
>      * @return A primitive boolean value; true if the DocumentPart
> encapsulates
>      *         a Table, false otherwise.
>      */
>     public boolean isTable() {
>         return(this.docPart instanceof Table);
>     }
> 
>     /**
>      * If the DocumentPart encapsulates a Table, get the number of rows in
> the
>      * rable.
>      *
>      * @return A primitive int whose value indicates how many rows there
> are in
>      *         the table.
>      * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
>      *         called for a DocumentPart instance that encapsulates a
> Paragraph.
>      */
>     public int getNumRows() throws UnsupportedOperationException {
>         int numRows = 0;
>         if(this.isTable()) {
>             Table table = (Table)this.docPart;
>             numRows = table.numRows();
>         }
>         else {
>             throw new UnsupportedOperationException("The DocumentPart does
> " +
>                     "not encapsulate a Table.");
>         }
>         return(numRows);
>     }
> 
>     /**
>      * How many columns are there in the Table. This method assumes that
> the
>      * table is 'square', i.e. that each row of the Table holds the same
> number
>      * of columns.
>      *
>      * @return A primitive int whose value indicates how many columns
> there are
>      *         in the Table.
>      * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
>      *         called for a DocumentPart instance that encapsulates a
> Paragraph.
>      */
>     public int getNumColumns() throws UnsupportedOperationException {
>         return(this.getNumColumns(0));
>     }
> 
>     /**
>      * How many columns are there in a specific row of the Table?
>      *
>      * @return A primitive int whose value indicates how many columns
> there are
>      *         in the Table row.
>      * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
>      *         called for a DocumentPart instance that encapsulates a
> Paragraph.
>      */
>     public int getNumColumns(int rowNum) throws
> UnsupportedOperationException {
>         int numColumns = 0;
>         if(this.isTable()) {
>             Table table = (Table)this.docPart;
>             TableRow row = table.getRow(rowNum);
>             numColumns = row.numCells();
>         }
>         else {
>             throw new UnsupportedOperationException("The DocumentPart does
> " +
>                     "not encapsulate a Table.");
>         }
>         return(numColumns);
>     }
> 
>     /**
>      * Return the contents of a specific cell. Still to be coded fully.
>      *
>      * @param rowNum A primitive int that indicates the row the cell is
> on.
>      *               Remember that row indices are zero based.
>      * @param colNum A primitive int that indicates the column the cell is
> in.
>      *               Remember that column indices are zero based.
>      * @return An instance of the String class that encapsulates the cells
>      *         contents
>      * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
>      *         called for a DocumentPart instance that encapsulates a
> Paragraph.
>      */
>     public String getCellContents(int rowNum, int colNum)
>                                           throws
> UnsupportedOperationException {
>         return("");
>     }
> 
>     /**
>      * Return the text of the Paragraph.
>      *
>      * @return An instance of the String class that encapsulates the text
>      *         the Paragraph contained. Note that this will be stripped of
>      *         all fields.
>      * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
>      *         called for a DocumentPart instance that encapsulates a
> Table.
>      */
>     public String getParagraphText() throws UnsupportedOperationException
> {
>         String returnValue = null;
>         if(!this.isTable()) {
>             Paragraph para = (Paragraph)this.docPart;
>             returnValue = Range.stripFields(para.text());
>         }
>         else {
>             throw new IllegalStateException("The DocumentPart does not " +
>                     "encapsulate a Paragraph.");
>         }
>         return(returnValue);
>     }
> 
>     /**
>      * Get a reference to the Range object - the Table or Paragraph - that
> a
>      * specific DocumentPart instance encapsulates.
>      *
>      * @return A reference to the encapsulated Range object.
>      */
>     public Range getRangeObject() {
>         return(this.docPart);
>     }
> 
>     /**
>      * Get a String representation of the objects state.
>      *
>      * @return An instance of the String class that encapsulates
> information
>      *         describing the current state of the object.
>      */
>     public String toString() {
>         StringBuffer buffer = new StringBuffer();
>         buffer.append("DocumentPart.\n");
>         if(this.isTable()) {
>             buffer.append("This document part is a Table.\n");
>             int numRows = this.getNumRows();
>             int numCells = 0;
>             buffer.append("Number of rows: " + numRows + "\n");
>             for(int i = 0; i < numRows; i++) {
>                 numCells = this.getNumColumns(i);
>                 buffer.append("Number of cells in row " +
>                               i +
>                               ": " +
>                               numCells +
>                               "\n");
>                 for(int j = 0; j < numCells; j++) {
>                     buffer.append("Cell " +
>                                   j +
>                                   " in row " +
>                                   i +
>                                   " contains: " +
>                                   this.getCellContents(i, j) +
>                                   "\n");
>                 }
>             }
>         }
>         else {
>             buffer.append("This document part is a Paragraph.\n");
>             buffer.append("Text: \n\"");
>             buffer.append(this.getParagraphText() + "\"");
>         }
>         return(buffer.toString());
>     }
> }
> 
> package comparedocuments;
> 
> import org.apache.poi.hwpf.usermodel.Range;
> import org.apache.poi.hwpf.usermodel.Paragraph;
> import org.apache.poi.hwpf.usermodel.Table;
> 
> /**
>  * An instance of this class encapsulates information about changes that
> have
>  * occurred between versions of a Word document. Currently, it is
> restricted to
>  * tracking changes to paragraphs only and the changes that can be
> described are
>  * the insertion of a new paragraph, the deletion of an existing
> paragraph, the
>  * modification of an existing paragraph and the relocation of an existing
>  * paragraph to a new location in the document.
>  *
>  * @author Mark B
>  * @version 1.00 28th July 2009
>  */
> public class ComparisonResult {
> 
>     private Range docPart = null;
>     private int comparisonResult = 0;
> 
>     public static final int INSERTED = 0;
>     public static final int DELETED = 1;
>     public static final int MODIFIED = 2;
>     public static final int MOVED = 3;
> 
>     /**
>      * Create a new instance of the ComparisonResult class using the
>      * following parameters
>      *
>      * @param docPart A Range object that encapsulates the document 'part'
>      *                recovered from the Word document; this can be either
> a
>      *                Table or Paragraph but note that no checks are made
>      *                on the objects type thought they ought to be.
>      * @param comparisonResult A primitive int that describes the type or
>      *                         result of the comparison. Constants are
>      *                         provided to support this but no checks are
>      *                         made on the value passed.
>      */
>     public ComparisonResult(Range docPart, int comparisonResult) {
>         this.docPart = docPart;
>         this.comparisonResult = comparisonResult;
>     }
> 
>     /**
>      * Get the result or type of the comparison.
>      *
>      * @return A primitive int value that indicates the result of
> comparing
>      *         this document part to others. The following constants have
> been
>      *         declared;
>      *             ComparisonResult.INSERTED
>      *             ComparisonResult.DELETED
>      *             ComparisonResult.MODIFIED
>      *             ComparisonResult.MOVED
>      *
>      */
>     public int getComparisonResult() {
>         return(this.comparisonResult);
>     }
> 
>     /**
>      * Sets the result or type of the comparison. Note that no checks are
>      * made on the value passed to this method; they OUGHT to be.
>      *
>      * @param comparisonResult A primitive int that describes the result
> or
>      *                         type of the comparison. Constants are
> provided
>      *                         to support this process;
>      *                             ComparisonResult.INSERTED
>      *                             ComparisonResult.DELETED
>      *                             ComparisonResult.MODIFIED
>      *                             ComparisonResult.MOVED
>      */
>     public void setComparisonResult(int comparisonResult) {
>         this.comparisonResult = comparisonResult;
>     }
> 
>     /**
>      * Does this instance encapsulate an
> org.apache.poi.hwpf.usermodel.Table.
>      *
>      * @return The boolean value 'true' will be returned if the instance
>      *         encapsulates a Table, false if a Paragraph.
>      */
>     public boolean isTable() {
>         return(this.docPart instanceof Table);
>     }
> 
>     /**
>      * Return the text of the Paragraph.
>      *
>      * @return An instance of the String class that encapsulates the text
>      *         the Paragraph contained. Note that this will be stripped of
>      *         all fields.
>      * @throws java.lang.UnsupportedOperationException Thrown if this
> method is
>      *         called for a DocumentPart instance that encapsulates a
> Table.
>      */
>     public String getParagraphText() throws UnsupportedOperationException
> {
>         String returnValue = null;
>         if(!this.isTable()) {
>             Paragraph para = (Paragraph)this.docPart;
>             returnValue = Range.stripFields(para.text());
>         }
>         else {
>             throw new IllegalStateException("The DocumentPart does not " +
>                     "encapsulate a Paragraph.");
>         }
>         return(returnValue);
>     }
> 
>     /**
>      * Returns a String representation of the objects internal state.
>      *
>      * @return A String that describes the objects state.
>      */
>     public String toString() {
>         // Not at all good code but it will suffice for testing.
>         StringBuffer buffer = new StringBuffer();
>         buffer.append("ComparisonResult.\n");
>         if(this.isTable()) {
>             buffer.append("For a Table that has been modified.\n");
>             switch(this.getComparisonResult()) {
>                 case INSERTED:
>                     buffer.append("It has been inserted.\n");
>                     break;
>                 case DELETED:
>                     buffer.append("It has been deleted.\n");
>                     break;
>                 case MODIFIED:
>                     buffer.append("It has been modified.\n");
>                     break;
>                 case MOVED:
>                     buffer.append("It has been moved.\n");
>                     break;
>             }
>          }
>         else {
>             buffer.append("For a Paragraph that has been modified.");
>             switch(this.getComparisonResult()) {
>                 case INSERTED:
>                     buffer.append("It has been inserted.\n");
>                     break;
>                 case DELETED:
>                     buffer.append("It has been deleted.\n");
>                     break;
>                 case MODIFIED:
>                     buffer.append("It has been modified.\n");
>                     break;
>                 case MOVED:
>                     buffer.append("It has been moved.\n");
>                     break;
>             }
>             buffer.append("Text: \n\"");
>             buffer.append(this.getParagraphText() + "\"");
>         }
>         return(buffer.toString());
>     }
> }
> 
> 
> 
> bihag wrote:
>> 
>> Hi Mark, 
>> 
>> sorry but I didn't started working on it ... and want be able start it
>> for next 2 days as we have our product demo tomorrow ...
>> 
>> anyway thanks a lot for your time and efforts.
>> 
>> Regards,
>> Bihag Raval.
>> 
>> 
>> 
>> MSB wrote:
>>> 
>>> Had the chance to re-think the work last night and would like to propose
>>> a few changes.
>>> 
>>> Firstly, the DocumentPart class should be refactored IMO; it is trying
>>> to do two things at once and this is never a good idea. I propose
>>> removing the following;
>>> 
>>> public static final int INSERTED = 0;
>>> public static final int DELETED = 1;
>>> public static final int MODIFIED = 2;
>>> public static final int UN_MODIFIED = 3;
>>> public static final int MOVED = 4;
>>> 
>>> and
>>> 
>>>     /**
>>>      * Get the result of the comparison.
>>>      *
>>>      * @return A primitive int value that indicates the result of
>>> comparing
>>>      *         this document part to others. The following constants
>>> have been
>>>      *         declared;
>>>      *             DocumentPart.INSERTED = 0;
>>>      *             DocumentPart.DELETED = 1;
>>>      *             DocumentPart.MODIFIED = 2;
>>>      *             DocumentPart.UN_MODIFIED = 3;
>>>      *             DocumentPart.MOVED = 4;
>>>      *
>>>      */
>>>     public int getComparisonResult() {
>>>         return(this.comparisonResult);
>>>     }
>>> 
>>>     /**
>>>      * Store the result of the domnparsion between document parts.
>>>      *
>>>      * @param comparisonResult A primitive int whose value indicates the
>>> result
>>>      *                         of comparing one document part with
>>> others.
>>>      */
>>>     public void setComparisonResult(int comparisonResult) {
>>>         this.comparisonResult = comparisonResult;
>>>     }
>>> 
>>>  
>>> Next, adding a new class called something like ComparisonResult or
>>> ReportableComparison. It will encapsulate the information removed from
>>> the DocumentPart calss along with a Range object. It's purpose is to
>>> track a reportable comparison, an insertion, deletion, modification or
>>> transformation. I propose that as we detect one of these when comparing
>>> the two documents an instance of the new clas is created, the affected
>>> Range is copied over and the appropriate comparsion result setting made.
>>> Following the comparison between the two documents, an ArrayList of
>>> comparison result objects can be used to create the report.
>>> 
>>> Thirdly, there is something you may wish to discuss with your
>>> colleagues; it is a fall back position for us in case HWPF cannot
>>> successfully create the comparison report. The Rich Text File Format is
>>> an open, relatively simple file format developed by Microsoft. It is
>>> possible to create an rtf file with a .doc extension such that when the
>>> user double clicks on the file, Word is used to open it. A rather good
>>> api exists - called iText - that makes it possible to create rtf files
>>> using Java code, so it should be possible for this application to output
>>> it's results as an rtf file, with a .doc extension. The users may never
>>> know any different.
>>> 
>>> Anyway, if I have the time today, I will make the changes suggested
>>> above and write the easy comparisions. I think the easy comparisions are
>>> paragraphs that have been added to the 'new' document, those that have
>>> been deleted from the original document and those that have been moved.
>>> The basic approach I am going to take;
>>> 
>>> 1. Get a paragraph from the original document.
>>> 2. Try to match it with the paragraph at the same position in the new
>>> document. If a match is found here then no further action is necessary.
>>> 3. If a match cannot be found at the same location in the new document,
>>> start from the first paragraph and cehck every currently un-matched
>>> paragraph. If a match is found then mark this as a paragraph that has
>>> been moved. If a match cannot be found then mark it as a pargarph that
>>> has been deleted.
>>> 4. Once all of the paragraphs in the original document have been
>>> checked, any un-matched paragraphs that remain in the new document can
>>> be marked as new paragraphs or insertions.
>>> 
>>> When I say 'mark' in the above, I mean to create an instance of the new
>>> ComparsionResult class, copy over the reference to the Range and set the
>>> result of the comparsion.
>>> 
>>> For now, I will not spend time looking at tables and will try to ignore
>>> the possibile complexities of deciding if a paragraph has been modified.
>>> It may prove a step too far but I would also like to try adding the
>>> result producing step just to see if HWPF can produce a suitable report
>>> for us.
>>> 
>>> As before, will post again if I make any progress. Do not feel you
>>> cannot say 'stop' or 'wait I want to think about something' or that you
>>> cannot suggest changes, modifications or a completely different approach
>>> yourself if the current solution is veering away from your original
>>> requirement. Ideally, we should produce the solution together and I
>>> ought not to 'force' you into something; I am only too well aware of how
>>> easy it is to be swept along as a project gathers momentum.
>>> 
>>> Yours
>>> 
>>> Mark B
>>> 
>>> 
>>> bihag wrote:
>>>> 
>>>> Hi Mark,
>>>> 
>>>> I would sincerely like to convey my Thanks to you.
>>>> The tips you have given is really helpful.
>>>> appreciated your time and efforts.
>>>> 
>>>> Regards,
>>>> Bihag 
>>>> 
>>>> 
>>>> 
>>>> MSB wrote:
>>>>> 
>>>>> Have not had the time to do much work or ANY testing so please treat
>>>>> this with caution.
>>>>> 
>>>>> What I am proposing is that the contents of a Word document be
>>>>> converted into an ArrayList. That ArrayList will contain instances of
>>>>> the DocumentPart class and these will facilitate the comparison
>>>>> operation. I have not given those a great deal of thought yet but
>>>>> believe that we should check for any paragraphs - not tables yet -
>>>>> being inserted, deleted, modified (not sure how to proceed with this
>>>>> one yet) or moved. As you can see, I have provided constants in the
>>>>> DocumentPart class to support these different results. The comparison
>>>>> status flag is there to prevent a paragraph being checked again once a
>>>>> match has been found but I am thinking of another use if the logic
>>>>> holds.
>>>>> 
>>>>> As yet, I have not coded the compare methods or the save results
>>>>> method as I think it is wise to throughly test the loading method
>>>>> firstly. We need to be certain that the ArrayList of DocumentPart(s)
>>>>> accurately describes the documents. I think that you are in a 'better'
>>>>> time-zone and that you may have the opportunity to test the code
>>>>> before me. If you look at the main method of the DocumentComparator
>>>>> class, you will see how to run the code. All you need to do for now is
>>>>> make sure that the first two parameters to the compareDocuments()
>>>>> method point to Word files and then run the code. To check the
>>>>> results, you can either modify DocumentPart to add a toString() method
>>>>> that outputs  the instances contents or simply call the
>>>>> getParagraphText() and getCellContents() methods from the
>>>>> compareDocument() method.
>>>>> 
>>>>> Anyway, here is the code so far. Have a look and see if it is the way
>>>>> you want to go - or think makes sense. Do not feel that you cannot
>>>>> criticise or alter the code or the approach as, for now we are not
>>>>> committed to any particular strategy, just exploring what is possible.
>>>>> 
>>>>> package comparedocuments;
>>>>> 
>>>>> import java.io.File;
>>>>> import java.io.FileInputStream;
>>>>> import java.util.ArrayList;
>>>>> import java.io.FileNotFoundException;
>>>>> import java.io.IOException;
>>>>> 
>>>>> import org.apache.poi.hwpf.HWPFDocument;
>>>>> import org.apache.poi.hwpf.usermodel.Range;
>>>>> import org.apache.poi.hwpf.usermodel.Paragraph;
>>>>> 
>>>>> /**
>>>>>  * An instance of this calss can be used to perform a comparison
>>>>> between two
>>>>>  * binary (OLE2CDF) Microsoft Word documents.
>>>>>  *
>>>>>  * @author Mark B
>>>>>  * @version 1.00 27th July 2009
>>>>>  */
>>>>> public class DocumentComparator {
>>>>>     
>>>>>     /**
>>>>>      * Called to compare the two documents and output the results of
>>>>> the
>>>>>      * comparison to a third Microsoft Word document.
>>>>>      * 
>>>>>      * @param originalDoc The path to and name of the original
>>>>> document, the
>>>>>      *                    document that is the basis for the
>>>>> comparison.
>>>>>      * @param compareToDoc The path to and name of the document that
>>>>> should
>>>>>      *                     be compared with the original for any
>>>>> modifications.
>>>>>      * @param resultDoc The path to and name of the document that
>>>>> should contain
>>>>>      *                  the results of the comparison process.
>>>>>      * @param docTemplate The path to and name of the empty Word
>>>>> document that
>>>>>      *                    should be used as the basis for the rusults
>>>>> document.
>>>>>      * @throws java.io.IOException Thrown to signal that some sort of
>>>>> I/O
>>>>>      *                             Exception has occurred.
>>>>>      * @throws java.io.FileNotFoundException Thrown to signal that a
>>>>> file
>>>>>      *                                       could not be located.
>>>>>      */
>>>>>     public void compareDocuments(String originalDoc, String
>>>>> compareToDoc,
>>>>>                                  String resultDoc, String docTemplate)
>>>>>                                  throws IOException,
>>>>> FileNotFoundException {
>>>>>         ArrayList<DocumentPart> originalDocParts =
>>>>> this.loadDocument(originalDoc);
>>>>>         ArrayList<DocumentPart> compareToDocParts =
>>>>> this.loadDocument(compareToDoc);
>>>>>         this.compareDocs(originalDocParts, compareToDocParts);
>>>>>         this.saveResults(originalDocParts, compareToDocParts,
>>>>> resultDoc);
>>>>>     }
>>>>>     
>>>>>     /**
>>>>>      * Opens a named binary (OLE2CDF) Microsoft Word document and
>>>>> converts that
>>>>>      * documents contents into an ArrayList of instances of the
>>>>> DocumentPart
>>>>>      * class.
>>>>>      * @param docName The path to and name of a Microsoft Word
>>>>> document file.
>>>>>      * @return An instance of the ArrayList class encapsulating
>>>>> instances
>>>>>      *         of the DocumentPart class. Each DocumentPart will
>>>>> encapsulate
>>>>>      *         information about a paragraph of text or a table
>>>>> recovered from
>>>>>      *         the Microsoft Word document.
>>>>>      * @throws java.io.IOException If an I/O Exception occurs
>>>>>      * @throws java.io.FileNotFoundException Thrown to indicate that
>>>>> the
>>>>>      *                                       named Microsoft Word file
>>>>> could
>>>>>      *                                       not be located.
>>>>>      */
>>>>>     public ArrayList<DocumentPart> loadDocument(String docName)
>>>>>                                      throws IOException,
>>>>> FileNotFoundException {
>>>>>         File file = null;
>>>>>         FileInputStream fis = null;
>>>>>         HWPFDocument document = null;
>>>>>         Range overallRange = null;
>>>>>         Paragraph para = null;
>>>>>         int numParas = 0;
>>>>>         boolean inTable = false;
>>>>>         ArrayList<DocumentPart> docParts = null;
>>>>>         try {
>>>>>             // Open the Word file.
>>>>>             file = new File(docName);
>>>>>             fis = new FileInputStream(file);
>>>>>             document = new HWPFDocument(fis);
>>>>>             // Get the overall Range for the document and the number
>>>>>             // of paragraphs from this Range.
>>>>>             overallRange = document.getOverallRange();
>>>>>             numParas = overallRange.numParagraphs();
>>>>>             for(int i = 0; i < numParas; i++) {
>>>>>                 para = overallRange.getParagraph(i);
>>>>>                 // Is the paragraph 'in' a table? If so, it is
>>>>> possible to
>>>>>                 // recover a reference to that Table from the first
>>>>> paragraph
>>>>>                 // only. If calls are made to the getTable() method
>>>>> using
>>>>>                 // subsequent paragraphs then an exception will be
>>>>> thrown. So,
>>>>>                 // after getting the Table, a flag is set to prevent
>>>>> further
>>>>>                 // calls to the getTable() method.
>>>>>                 if(para.isInTable()) {
>>>>>                     if(!inTable) {
>>>>>                         // Get a reference to the Table and pass it to
>>>>> the
>>>>>                         // constructor of the DocumentPart class. Add
>>>>> the
>>>>>                         // DocumentPart instance to the ArrayLlist.
>>>>>                         docParts.add(new DocumentPart(
>>>>>                                 overallRange.getTable(para)));
>>>>>                         inTable = true;
>>>>>                     }
>>>>>                 }
>>>>>                 // The paragraph is not in a table so simply add a new
>>>>> instance
>>>>>                 // to the ArrayList that encapsulates the paragraph of
>>>>> text.
>>>>>                 else {
>>>>>                     docParts.add(new DocumentPart(para));
>>>>>                     inTable = false;
>>>>>                 }
>>>>>             }
>>>>>             return(docParts);
>>>>>         }
>>>>>         finally {
>>>>>             if(fis != null) {
>>>>>                 try {
>>>>>                   fis.close();  
>>>>>                 }
>>>>>                 catch(IOException ioEx) {
>>>>>                     // I G N O R E
>>>>>                 }
>>>>>             }
>>>>>         }
>>>>>     }
>>>>>     
>>>>>     public void compareDocs(ArrayList<DocumentPart> originalDocParts,
>>>>>                             ArrayList<DocumentPart> compareToDocParts)
>>>>> {
>>>>>         // TO DO: Code comparsion
>>>>>     }
>>>>>     
>>>>>     public void saveResults(ArrayList<DocumentPart> originalDocParts,
>>>>>                             ArrayList<DocumentPart> compareToDocParts,
>>>>>                             String resultDoc)
>>>>>                                      throws IOException,
>>>>> FileNotFoundException {
>>>>>         // TO DO: Code saving of results.
>>>>>     }
>>>>> 
>>>>>     /**
>>>>>      * Main entry point to the program.
>>>>>      *
>>>>>      * @param args
>>>>>      */
>>>>>     public static void main(String[] args) {
>>>>>         try {
>>>>>             DocumentComparator docComp = new DocumentComparator();
>>>>>             docComp.compareDocuments("original document",
>>>>>                                      "compare to document",
>>>>>                                      "results document",
>>>>>                                      "results document template");
>>>>>         }
>>>>>         catch(FileNotFoundException fnfEx) {
>>>>>             // TO DO: Code exception handling.
>>>>>         }
>>>>>         catch(IOException ioEx) {
>>>>>             // TO DO: Code exception handling.
>>>>>         }
>>>>>     }
>>>>> }
>>>>> 
>>>>> package comparedocuments;
>>>>> 
>>>>> import org.apache.poi.hwpf.usermodel.Range;
>>>>> import org.apache.poi.hwpf.usermodel.Paragraph;
>>>>> import org.apache.poi.hwpf.usermodel.Table;
>>>>> import org.apache.poi.hwpf.usermodel.TableRow;
>>>>> 
>>>>> /**
>>>>>  * Encapsulates a 'part' of a Microsoft Word document. Currently, that
>>>>> part can
>>>>>  * either be a Table or a paragraph of text.
>>>>>  *
>>>>>  * @author Mark B
>>>>>  * @version 1.00 27th July 2009.
>>>>>  */
>>>>> public class DocumentPart {
>>>>> 
>>>>>     private Range docPart = null;
>>>>>     private boolean comparisonStatus = false;
>>>>>     private int comparisonResult = 0;
>>>>> 
>>>>>     public static final int INSERTED = 0;
>>>>>     public static final int DELETED = 1;
>>>>>     public static final int MODIFIED = 2;
>>>>>     public static final int UN_MODIFIED = 3;
>>>>>     public static final int MOVED = 4;
>>>>> 
>>>>>     /**
>>>>>      * Create a new instance of the DocumentPart class using the
>>>>> following
>>>>>      * paramater.
>>>>>      *
>>>>>      * @param docPart An instance of the
>>>>> org.apache.poi.hwpf.usermodel.Range
>>>>>      *                class that will encapsulate an instance of the
>>>>>      *                org.apache.poi.hwpf.usermodel.Paragraph or an
>>>>> instance
>>>>>      *                of the org.apache.poi.hwpf.usermodel.Table
>>>>> class.
>>>>>      */
>>>>>     public DocumentPart(Range docPart) {
>>>>>         this.docPart = docPart;
>>>>>         // Note that as the part has not been successfully compared to
>>>>> another
>>>>>         // part the status is false.
>>>>>         this.comparisonStatus = false;
>>>>>         // and that the type is set to un-modified. Any parts that
>>>>> have not been
>>>>>         // checked or that are not un-modified will be written away to
>>>>> the
>>>>>         // results document.
>>>>>         this.comparisonResult = DocumentPart.UN_MODIFIED;
>>>>>     }
>>>>> 
>>>>>     /**
>>>>>      * Has a match been foound for this document part?
>>>>>      *
>>>>>      * @return A boolean value that indicates whether a match was
>>>>> found between
>>>>>      *         two document parts.
>>>>>      */
>>>>>     public boolean isMatched() {
>>>>>         return(this.comparisonStatus);
>>>>>     }
>>>>> 
>>>>>     /**
>>>>>      * Get the result of the comparison.
>>>>>      *
>>>>>      * @return A primitive int value that indicates the result of
>>>>> comparing
>>>>>      *         this document part to others. The following constants
>>>>> have been
>>>>>      *         declared;
>>>>>      *             DocumentPart.INSERTED = 0;
>>>>>      *             DocumentPart.DELETED = 1;
>>>>>      *             DocumentPart.MODIFIED = 2;
>>>>>      *             DocumentPart.UN_MODIFIED = 3;
>>>>>      *             DocumentPart.MOVED = 4;
>>>>>      *
>>>>>      */
>>>>>     public int getComparisonResult() {
>>>>>         return(this.comparisonResult);
>>>>>     }
>>>>> 
>>>>>     /**
>>>>>      * Store the result of the domnparsion between document parts.
>>>>>      *
>>>>>      * @param comparisonResult A primitive int whose value indicates
>>>>> the result
>>>>>      *                         of comparing one document part with
>>>>> others.
>>>>>      */
>>>>>     public void setComparisonResult(int comparisonResult) {
>>>>>         this.comparisonResult = comparisonResult;
>>>>>     }
>>>>> 
>>>>>     /**
>>>>>      * Does a DocumentPart encapsulate a table?
>>>>>      * @return A primitive boolean value; true if the DocumentPart
>>>>> encapsulates
>>>>>      *         a Table, false otherwise.
>>>>>      */
>>>>>     public boolean isTable() {
>>>>>         return(this.docPart instanceof Table);
>>>>>     }
>>>>> 
>>>>>     /**
>>>>>      * If the DocumentPart encapsulates a Table, get the number of
>>>>> rows in the
>>>>>      * rable.
>>>>>      *
>>>>>      * @return A primitive int whose value indicates how many rows
>>>>> there are in
>>>>>      *         the table.
>>>>>      * @throws java.lang.UnsupportedOperationException Thrown if this
>>>>> method is
>>>>>      *         called for a DocumentPart instance that encapsulates a
>>>>> Paragraph.
>>>>>      */
>>>>>     public int getNumRows() throws UnsupportedOperationException {
>>>>>         int numRows = 0;
>>>>>         if(this.isTable()) {
>>>>>             Table table = (Table)this.docPart;
>>>>>             numRows = table.numRows();
>>>>>         }
>>>>>         else {
>>>>>             throw new UnsupportedOperationException("The DocumentPart
>>>>> does " +
>>>>>                     "not encapsulate a Table.");
>>>>>         }
>>>>>         return(numRows);
>>>>>     }
>>>>> 
>>>>>     /**
>>>>>      * How many columns are there in the Table. This method assumes
>>>>> that the
>>>>>      * table is 'square', i.e. that each row of the Table holds the
>>>>> same number
>>>>>      * of columns.
>>>>>      *
>>>>>      * @return A primitive int whose value indicates how many columns
>>>>> there are
>>>>>      *         in the Table.
>>>>>      * @throws java.lang.UnsupportedOperationException Thrown if this
>>>>> method is
>>>>>      *         called for a DocumentPart instance that encapsulates a
>>>>> Paragraph.
>>>>>      */
>>>>>     public int getNumColumns() throws UnsupportedOperationException {
>>>>>         return(this.getNumColumns(0));
>>>>>     }
>>>>> 
>>>>>     /**
>>>>>      * How many columns are there in a specific row of the Table.
>>>>>      *
>>>>>      * @return A primitive int whose value indicates how many columns
>>>>> there are
>>>>>      *         in the Table row.
>>>>>      * @throws java.lang.UnsupportedOperationException Thrown if this
>>>>> method is
>>>>>      *         called for a DocumentPart instance that encapsulates a
>>>>> Paragraph.
>>>>>      */
>>>>>     public int getNumColumns(int rowNum) throws
>>>>> UnsupportedOperationException {
>>>>>         int numColumns = 0;
>>>>>         if(this.isTable()) {
>>>>>             Table table = (Table)this.docPart;
>>>>>             TableRow row = table.getRow(rowNum);
>>>>>             numColumns = row.numCells();
>>>>>         }
>>>>>         else {
>>>>>             throw new UnsupportedOperationException("The DocumentPart
>>>>> does " +
>>>>>                     "not encapsulate a Table.");
>>>>>         }
>>>>>         return(numColumns);
>>>>>     }
>>>>> 
>>>>>     /**
>>>>>      * Return the contents of a specific cell.
>>>>>      *
>>>>>      * @param rowNum A primitive int that indicates the row the cell
>>>>> is on.
>>>>>      *               Remember that row indices are zero based.
>>>>>      * @param colNum A primitive int that indicates the column the
>>>>> cell is in.
>>>>>      *               Remember that column indices are zero based.
>>>>>      * @return An instance of the String class that encapsulates the
>>>>> cells
>>>>>      *         contents
>>>>>      * @throws java.lang.UnsupportedOperationException Thrown if this
>>>>> method is
>>>>>      *         called for a DocumentPart instance that encapsulates a
>>>>> Paragraph.
>>>>>      */
>>>>>     public String getCellContents(int rowNum, int colNum)
>>>>>                                           throws
>>>>> UnsupportedOperationException {
>>>>>         return(null);
>>>>>     }
>>>>> 
>>>>>     /**
>>>>>      * Return the text of the Paragraph.
>>>>>      *
>>>>>      * @return An instance of the String class that encapsulates the
>>>>> text
>>>>>      *         the Paragraph contained. Note that this will be
>>>>> stripped of
>>>>>      *         all fields.
>>>>>      * @throws java.lang.UnsupportedOperationException Thrown if this
>>>>> method is
>>>>>      *         called for a DocumentPart instance that encapsulates a
>>>>> Table.
>>>>>      */
>>>>>     public String getParagraphText() throws
>>>>> UnsupportedOperationException {
>>>>>         String returnValue = null;
>>>>>         if(!this.isTable()) {
>>>>>             Paragraph para = (Paragraph)this.docPart;
>>>>>             returnValue = Range.stripFields(para.text());
>>>>>         }
>>>>>         else {
>>>>>             throw new IllegalStateException("The DocumentPart does not
>>>>> " +
>>>>>                     "encapsulate a Paragraph.");
>>>>>         }
>>>>>         return(returnValue);
>>>>>     }
>>>>> }
>>>>> 
>>>>> 
>>>>> 
>>>>> bihag wrote:
>>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> We want to compare two document and what ever things are not common
>>>>>> that we have to highlight with some color or any other way ... So I
>>>>>> thing we have to merge document or create new document which has
>>>>>> content of both the document, and show difference with some color,
>>>>>> like deleted with red, newly added with blue ... 
>>>>>> 
>>>>>> Mainly we are looking for OLE2CDF doc compare solution ...
>>>>>> 
>>>>>> please provide some code sniplet if possible ...
>>>>>> 
>>>>>> Thanking you in advance ...
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/How-to-compare-2-word-doc-%28OLE2CDF-or-OpenXML%29.-tp24673506p24701804.html
Sent from the POI - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message