Return-Path: X-Original-To: apmail-corinthia-dev-archive@minotaur.apache.org Delivered-To: apmail-corinthia-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 041AD1868A for ; Tue, 12 May 2015 07:25:06 +0000 (UTC) Received: (qmail 55863 invoked by uid 500); 12 May 2015 07:25:06 -0000 Delivered-To: apmail-corinthia-dev-archive@corinthia.apache.org Received: (qmail 55834 invoked by uid 500); 12 May 2015 07:25:05 -0000 Mailing-List: contact dev-help@corinthia.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@corinthia.incubator.apache.org Delivered-To: mailing list dev@corinthia.incubator.apache.org Received: (qmail 55823 invoked by uid 99); 12 May 2015 07:25:05 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 May 2015 07:25:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 73AACC426C for ; Tue, 12 May 2015 07:25:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.971 X-Spam-Level: ** X-Spam-Status: No, score=2.971 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id hWdS_9P8b9j6 for ; Tue, 12 May 2015 07:24:54 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with SMTP id D9B8A43E3C for ; Tue, 12 May 2015 07:24:53 +0000 (UTC) Received: (qmail 51797 invoked by uid 99); 12 May 2015 07:24:53 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 May 2015 07:24:53 +0000 Received: from [192.168.1.34] (unknown [202.44.228.21]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 49E521A0155 for ; Tue, 12 May 2015 07:24:52 +0000 (UTC) From: Peter Kelly Content-Type: multipart/alternative; boundary="Apple-Mail=_980C1C0F-C806-4525-9E00-F9EF9857F702" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) Subject: Re: Using regex.h? Date: Tue, 12 May 2015 14:24:41 +0700 References: To: dev@corinthia.incubator.apache.org In-Reply-To: X-Mailer: Apple Mail (2.2098) --Apple-Mail=_980C1C0F-C806-4525-9E00-F9EF9857F702 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 12 May 2015, at 1:07 am, Gabriela Gibson = wrote: >=20 > In my gbg_test.c file, I have produced the following monstrosity: >=20 > Tag text_h(DFNode *node) > { > char *s =3D node->attrs->value; Referencing node->attrs->value is incorrect, as you don=E2=80=99t know = whether the node will have any attributes, and it if does, whether the = style name (which is what I assume you=E2=80=99re looking for here) will = happen to be the first one. DFGetAttribute(node,TEXT_STYLE_NAME) is how = you would get this reliably. > if ((int)s[11] > 55 || strlen(s) =3D=3D 13) I saw a quote once that went something along the lines of =E2=80=9CC = gives you enough rope to hang yourself, and then a bit extra just to = make sure=E2=80=9D. The ability to index into arrays arbitrarily, = without any bounds checking, is one of the many strands of such rope ;) This code makes the assumption that the style name will contain at least = 12 characters. If it doesn=E2=80=99t, s[11] will be some random value, = and the test will randomly do the wrong thing based on whatever happened = to be in that part of memory as a result of previous stuff the program = did. Or crash, if you=E2=80=99re unlucky enough to be handed a string = that is right near the end of an allocated block of memory. What is 55? I had to look that up in an ascii chart. Ok, it=E2=80=99s = the character code for =E2=80=987=E2=80=99. In C, you can use character = literals an integers interchangeably, so if you were going t do such a = comparison (which is the wrong approach here, see below), you should = have >=3D =E2=80=987=E2=80=99. > return HTML_H6; > else > return HTML_H1 + (int)s[11] - 49; In ODF, we can=E2=80=99t rely on style names to determine the heading = level, because it=E2=80=99s perfectly legal to call them something other = than Heading_20_n, which is what OpenOffice seems to do by default. The = text:h element has an outline-level attribute; this indicates the level = of the heading. So you should get the value of the TEXT_OUTLINE_LEVEL = attribute and use that to determine which HTML heading tag to use. I=E2=80=99m not sure what the best way of dealign with outline levels = beyond 7 is. I=E2=80=99d suggest for now just making that a normal = paragraph. > Because I will need to make more such things to match the attribute > values, I'm wondering if we could use regex.h instead, or if that is > too unix specific and not available on other platforms. I don=E2=80=99t believe it=E2=80=99s available on windows. At any rate, = I would suggest avoiding regular expressions in the codebase unless = there=E2=80=99s a really compelling need. If you come across other = situations where you think a regex would be appropriate let me know; = from what I=E2=80=99ve seen of ODF I think we should be able to get away = without them. =E2=80=94 Dr Peter M. Kelly pmkelly@apache.org PGP key: http://www.kellypmk.net/pgp-key = (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966) --Apple-Mail=_980C1C0F-C806-4525-9E00-F9EF9857F702--