Return-Path: X-Original-To: apmail-poi-dev-archive@www.apache.org Delivered-To: apmail-poi-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1F4E810A6C for ; Sat, 17 Jan 2015 02:06:33 +0000 (UTC) Received: (qmail 20459 invoked by uid 500); 17 Jan 2015 02:06:34 -0000 Delivered-To: apmail-poi-dev-archive@poi.apache.org Received: (qmail 20418 invoked by uid 500); 17 Jan 2015 02:06:34 -0000 Mailing-List: contact dev-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Developers List" Delivered-To: mailing list dev@poi.apache.org Received: (qmail 20387 invoked by uid 99); 17 Jan 2015 02:06:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 Jan 2015 02:06:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jharrop@gmail.com designates 74.125.82.51 as permitted sender) Received: from [74.125.82.51] (HELO mail-wg0-f51.google.com) (74.125.82.51) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 Jan 2015 02:06:09 +0000 Received: by mail-wg0-f51.google.com with SMTP id l18so1333309wgh.10 for ; Fri, 16 Jan 2015 18:06:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=FowKfEAjfICifc7VlsAteSh4okBe3dsJRVLIzPWCis0=; b=tgD+YtkTkQRI+uemRbvR1CLPLvbC70gRKsl6pJ34OXyMJgQn1TpY7wjLakAyXMXBNL HkS9q9TnUvxDQRPERcESLwyzdSNOuEqEDU9olBTKcEm/1ka2ipHzElHCR3iLgIIlF8Nc T0npFFMjqTGjKURRtY3jsLEwuDZ1bKuPt5pwXvKGybkKhCUTpjJG+qXsTwi6e8QcVf6d l19SgBMRwtsTXcWYuaQ06GnTeQtn50O9PvkMwXpx9TwMnUNoBu/4VCXworFG5MqcFee3 WWL/5U/TL5VbZAaojoveGrqYDfrddS2gQniHaPOTYupeGHLEg5VOjKksBJtUo2J4PhfI O7JA== MIME-Version: 1.0 X-Received: by 10.180.105.68 with SMTP id gk4mr11633386wib.30.1421460368167; Fri, 16 Jan 2015 18:06:08 -0800 (PST) Received: by 10.194.37.170 with HTTP; Fri, 16 Jan 2015 18:06:08 -0800 (PST) In-Reply-To: References: <31383b166b4644958d262b5e46335047@CHBEHVS002.chbe01.local> Date: Sat, 17 Jan 2015 13:06:08 +1100 Message-ID: Subject: Re: [Discussion] Generating MS Word documents based on templates From: Jason Harrop To: POI Developers List Content-Type: multipart/alternative; boundary=f46d04426a604e9716050ccf88ed X-Virus-Checked: Checked by ClamAV on apache.org --f46d04426a604e9716050ccf88ed Content-Type: text/plain; charset=UTF-8 Of all the technologies available in Word which can be used as the basis of a document generation system, I have long believed content control data binding is most robust. See further [1]. The basic concept is that you keep your data in an XML file (with or without a schema of your choosing), and bind content controls to elements via XPath, so that the content of the document and the XML are kept in sync. Content control data binding is part of the OpenXML spec, and Microsoft Word has supported the above since 2007. There are 2 fundamental features a document assembly/generation system needs, which the 2007 implementation didn't explicitly/fully support: - repeating data (explicit support in Word 2013) - conditional content You can still handle those things using Word 2007 content controls though; I published the OpenDoPE conventions [2] to explain how. Fast forward to now (early 2015), and Microsoft still hasn't baked-in a way for handling conditional content. And I don't think the time is yet ripe for adopting their repeatingSection element, mainly because many organisation still use Word 2010, and 2010 drops that element without warning! See further [3]. docx4j (ASLv2, and which I maintain) contains an implementation of the OpenDoPE conventions. Meaning that given your XML data, and a docx template containing suitable content controls, it will do the processing to give you a resolved output docx. That code is used in various significant production installations, including large scale social security correspondence generation in a government department in North America. I'd be happy to see that code form the basis of a new top level project, and see it used with docx4j or POI (or even Aspose) - though because each of these represent the Open XML elements using different objects, some work would be required to make it implementation neutral. The main problem with document generation is that there are hundreds (possibly even thousands) of different solutions - some of which are open source - but the source templates are not interoperable, so users get locked into a particular vendor's implementation. So a major benefit of a top level Apache project might be to encourage standardisation on a source template format, as existing vendors provide tools for converting to/from it, and new/emerging vendors adopt the format. As Harry noted, you need to give template authors easy-to-use tools to create their templates, or authoring remains a bottleneck. There are Word Add-Ins for authoring OpenDoPE compliant templates which could be used as a starting point (though if the format adopted by the project became popular, you could see a variety of authoring tools becoming available, much like you have for HTML). cheers .. Jason [1] http://www.slideshare.net/plutext/document-generation-2012osdcsydney [2] http://www.opendope.org/ [3] http://www.docx4java.org/blog/2015/01/word-2013-repeatingsection-content-controls-ready-for-prime-time/ On Fri, Jan 16, 2015 at 12:04 AM, Freivogel Oliver wrote: > Hi Harry > > Fort positioning and formatting the dynamic parts in the template we used > the content controls introduces in Office 2007. Our component is able to > create a basic template with all the supported dynamic elements for a given > data structure. The editor of the template can then move, copy and paste or > remove this elements. There exists also a content control element for > iterating over a collection. Currently our component does not support it, > but I am sure this would be a great new feature. > > Oliver > > > -----Original Message----- > From: Harry Zhou [mailto:superharry@gmail.com] > Sent: Donnerstag, 15. Januar 2015 00:28 > To: POI Developers List > Subject: Re: [Discussion] Generating MS Word documents based on templates > > Hi Oliver, > > We build an internal tool similar to your description using a combination > or Apache POI and Freemarker (the web framework for UI is Apache > Tapestry). Output documents we need are DOCX. > > The hard part, as you probably already know, is to give users the ability > to manipulate templates. Document assembly process is pretty > straightforward. > > So yes, at least for our simple internal tool, Apache POI works. Not > familiar with docx4j so can't speak to that. > > Harry > > --f46d04426a604e9716050ccf88ed--