poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Harrop <jhar...@gmail.com>
Subject Re: [Discussion] Generating MS Word documents based on templates
Date Sat, 17 Jan 2015 02:06:08 GMT
Of all the technologies available in Word which can be used as the basis of
a document generation system, I have long believed content control data
binding is most robust.  See further [1].

The basic concept is that you keep your data in an XML file (with or
without a schema of your choosing), and bind content controls to elements
via XPath, so that the content of the document and the XML are kept in sync.

Content control data binding is part of the OpenXML spec, and Microsoft
Word has supported the above since 2007.

There are 2 fundamental features a document assembly/generation system
needs, which the 2007 implementation didn't explicitly/fully support:

- repeating data (explicit support in Word 2013)
- conditional content

You can still handle those things using Word 2007 content controls though;
I published the OpenDoPE conventions [2] to explain how.

Fast forward to now (early 2015), and Microsoft still hasn't baked-in a way
for handling conditional content.

And I don't think the time is yet ripe for adopting their repeatingSection
element, mainly because many organisation still use Word 2010, and 2010
drops that element without warning!  See further [3].

docx4j (ASLv2, and which I maintain) contains an implementation of the
OpenDoPE conventions.  Meaning that given your XML data, and a docx
template containing suitable content controls, it will do the processing to
give you a resolved output docx.

That code is used in various significant production installations,
including large scale social security correspondence generation in a
government department in North America.

I'd be happy to see that code form the basis of a new top level project,
and see it used with docx4j or POI (or even Aspose) - though because each
of these represent the Open XML elements using different objects, some work
would be required to make it implementation neutral.

The main problem with document generation is that there are hundreds
(possibly even thousands) of different solutions - some of which are open
source - but the source templates are not interoperable, so users get
locked into a particular vendor's implementation.

So a major benefit of a top level Apache project might be to encourage
standardisation on a source template format, as existing vendors provide
tools for converting to/from it, and new/emerging vendors adopt the format.

As Harry noted, you need to give template authors easy-to-use tools to
create their templates, or authoring remains a bottleneck.  There are Word
Add-Ins for authoring OpenDoPE compliant templates which could be used as a
starting point (though if the format adopted by the project became popular,
you could see a variety of authoring tools becoming available, much like
you have for HTML).

cheers .. Jason

[1] http://www.slideshare.net/plutext/document-generation-2012osdcsydney

[2] http://www.opendope.org/


On Fri, Jan 16, 2015 at 12:04 AM, Freivogel Oliver <oliver.freivogel@born.ch
> wrote:

> Hi Harry
> Fort positioning and formatting the dynamic parts in the template we used
> the content controls introduces in Office 2007. Our component is able to
> create a basic template with all the supported dynamic elements for a given
> data structure. The editor of the template can then move, copy and paste or
> remove this elements. There exists also a content control element for
> iterating over a collection. Currently our component does not support it,
> but  I am sure this would be a great new feature.
> Oliver

> -----Original Message-----
> From: Harry Zhou [mailto:superharry@gmail.com]
> Sent: Donnerstag, 15. Januar 2015 00:28
> To: POI Developers List
> Subject: Re: [Discussion] Generating MS Word documents based on templates
> Hi Oliver,
> We build an internal tool similar to your description using a combination
> or Apache POI and Freemarker (the web framework for UI is Apache
> Tapestry).  Output documents we need are DOCX.
> The hard part, as you probably already know, is to give users the ability
> to manipulate templates.  Document assembly process is pretty
> straightforward.
> So yes, at least for our simple internal tool, Apache POI works.  Not
> familiar with docx4j so can't speak to that.
> Harry

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message