Return-Path: X-Original-To: apmail-corinthia-dev-archive@minotaur.apache.org Delivered-To: apmail-corinthia-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2E263189B5 for ; Mon, 1 Jun 2015 16:44:01 +0000 (UTC) Received: (qmail 36024 invoked by uid 500); 1 Jun 2015 16:44:01 -0000 Delivered-To: apmail-corinthia-dev-archive@corinthia.apache.org Received: (qmail 35994 invoked by uid 500); 1 Jun 2015 16:44:01 -0000 Mailing-List: contact dev-help@corinthia.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@corinthia.incubator.apache.org Delivered-To: mailing list dev@corinthia.incubator.apache.org Received: (qmail 35983 invoked by uid 99); 1 Jun 2015 16:44:01 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2015 16:44:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 952791A4239 for ; Mon, 1 Jun 2015 16:44:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.991 X-Spam-Level: *** X-Spam-Status: No, score=3.991 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_LAZY_DOMAIN_SECURITY=1, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id OaltosD0OO86 for ; Mon, 1 Jun 2015 16:43:45 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with SMTP id AEE592496C for ; Mon, 1 Jun 2015 16:43:43 +0000 (UTC) Received: (qmail 35057 invoked by uid 99); 1 Jun 2015 16:43:42 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2015 16:43:42 +0000 Received: from [192.168.1.33] (unknown [202.44.228.67]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id E5FFD1A01AB for ; Mon, 1 Jun 2015 16:43:41 +0000 (UTC) From: Peter Kelly Content-Type: multipart/alternative; boundary="Apple-Mail=_34E9B615-D107-4B21-90FC-DE29CEDE1C5B" Message-Id: <0AEDA1A1-E3CB-4173-ABF1-511D030F99B3@apache.org> Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2100\)) Subject: Re: ODF to HTML Date: Mon, 1 Jun 2015 23:43:35 +0700 References: To: dev@corinthia.incubator.apache.org In-Reply-To: X-Mailer: Apple Mail (2.2100) --Apple-Mail=_34E9B615-D107-4B21-90FC-DE29CEDE1C5B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 (apologies if this is a duplicate - I sent from the wrong email address = before) > On 31 May 2015, at 7:09 pm, Ian C > wrote: >=20 > We also need to take into account the style hierarchy. I see from some = of the CSS documentation that there are mechanisms in place to manage = that but have not looked in detail. Any advice Peter? First, some general comments - what I recommend is to first build up a = custom data structure representing all the styles which can later be = queried when needed, e.g. when you encounter an element in content.xml = that has a particular style associated with it. In the Word filter, there are two classes used for this purpose: = WordSheet and WordStyle (the former being a collection of the latter). = These are defined in WordSheet.h and WordSheet.c. Early in the = conversion process, the filter goes through the XML document containing = the styles and builds up this data structure. This results in the code = being able to deal with the styles at a higher-level of abstraction than = examining the DOM tree of styles.xml directly. A while ago I made a start on the same thing for ODF - there=E2=80=99s = ODFSheet and ODFStyle classes defined for the same purpose. So a good = next step for tackling styles would be to traverse the DOM tree of = styles.xml and populate this data structure, creating a new ODFStyle = object for each style in the document, and adding them to the (single) = ODFSheet object for the document. This data structure could then be used = to generate the CSS text, as is done in the Word filter. > I just generated to div tags do we want that? Mapping to h1... hn = could be a better way but not sure how to really map the correct heading = styles to the hn. In the case of ODF, the information about what header to map to is = (usually) available more directly than in OOXML. Both specs refer to it = as the =E2=80=9Coutline level=E2=80=9D. In an ODF document, heading = outline levels start from 1 (just like HTML), but you also have the = distinction between and elements, so you can know = whether something is a heading or a regular paragraph. When encountering a element, you can determine the outline = level from the attribute, e.g.: Headline= One So here the value =E2=80=981=E2=80=99 is sufficient information to = indicate that you need to create a h1 element. The style-name attribute = is Heading_20_1, so the corresponding CSS would need to be: h1.Heading_20_1 { } and similarly for other levels, e.g. h2.Heading_20_2 { } Note that, as with your existing code, this would be generated = separately from the content itself, solely based on the information in = styles.xml, for the non-automatic styles. So I suggest separating buildCSS_Styles into two separate functions: One = which populates the CSSSheet object associated with the package (that = is, package->sheet, which I think is already created), and another which = examines the ODFSheet object and populates the CSSSheet object. =E2=80=94 Dr Peter M. Kelly pmkelly@apache.org PGP key: http://www.kellypmk.net/pgp-key = (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966) --Apple-Mail=_34E9B615-D107-4B21-90FC-DE29CEDE1C5B--