corinthia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kelly <pmke...@apache.org>
Subject Re: Using regex.h?
Date Tue, 12 May 2015 07:24:41 GMT
> On 12 May 2015, at 1:07 am, Gabriela Gibson <gabriela.gibson@gmail.com> wrote:
> 
> In my gbg_test.c file, I have produced the following monstrosity:
> 
> Tag text_h(DFNode *node)
> {
>    char *s = node->attrs->value;

Referencing node->attrs->value is incorrect, as you don’t know whether the node will
have any attributes, and it if does, whether the style name (which is what I assume you’re
looking for here) will happen to be the first one. DFGetAttribute(node,TEXT_STYLE_NAME) is
how you would get this reliably.

>    if ((int)s[11] > 55 || strlen(s) == 13)

I saw a quote once that went something along the lines of “C gives you enough rope to hang
yourself, and then a bit extra just to make sure”. The ability to index into arrays arbitrarily,
without any bounds checking, is one of the many strands of such rope ;)

This code makes the assumption that the style name will contain at least 12 characters. If
it doesn’t, s[11] will be some random value, and the test will randomly do the wrong thing
based on whatever happened to be in that part of memory as a result of previous stuff the
program did. Or crash, if you’re unlucky enough to be handed a string that is right near
the end of an allocated block of memory.

What is 55? I had to look that up in an ascii chart. Ok, it’s the character code for ‘7’.
In C, you can use character literals an integers interchangeably, so if you were going t do
such a comparison (which is the wrong approach here, see below), you should have >= ‘7’.

>        return HTML_H6;
>    else
>        return HTML_H1 + (int)s[11] - 49;

In ODF, we can’t rely on style names to determine the heading level, because it’s perfectly
legal to call them something other than Heading_20_n, which is what OpenOffice seems to do
by default. The text:h element has an outline-level attribute; this indicates the level of
the heading. So you should get the value of the TEXT_OUTLINE_LEVEL attribute and use that
to determine which HTML heading tag to use.

I’m not sure what the best way of dealign with outline levels beyond 7 is. I’d suggest
for now just making that a normal paragraph.

> Because I will need to make more such things to match the attribute
> values, I'm wondering if we could use regex.h instead, or if that is
> too unix specific and not available on other platforms.

I don’t believe it’s available on windows. At any rate, I would suggest avoiding regular
expressions in the codebase unless there’s a really compelling need. If you come across
other situations where you think a regex would be appropriate let me know; from what I’ve
seen of ODF I think we should be able to get away without them.

—
Dr Peter M. Kelly
pmkelly@apache.org

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message