incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: [KinoSearch] Compile 0.30_07 on FreeBSD 7
Date Mon, 16 Nov 2009 17:38:14 GMT
On Sun, Nov 15, 2009 at 10:10:42AM -0800, Nathan Kurz wrote:

> Quoting by hand isn't that hard, and you can probably even make the
> preprocessor do most of the work for you:
> #define QUOTE(x) #x "\n"
> static char lseek_code[] =
>     QUOTE(%s)
>     QUOTE(#include "_charm.h")
>     QUOTE(int main() {)
>     QUOTE(  int fd;)
>     QUOTE(  Charm_Setup;)
>     QUOTE(  fd = open("_charm_lseek", O_WRONLY | O_CREAT, 0666);)
>     QUOTE(  if (fd == -1) { exit(-1); })
>     QUOTE(  %s(stdout, 0, SEEK_SET);))
>     QUOTE(  printf("%%d", 1);)
>     QUOTE(  if (close(fd)) { exit(-1); })
>     QUOTE(  return 0;)
>     QUOTE(})
>     ;

This is potentially pretty nice.

I'm bikeshedding here, but I think code clarity improves with some padding:

static char lseek_code[] =
    QUOTE(  %s                                                        )
    QUOTE(  #include "_charm.h"                                       )
    QUOTE(  int main() {                                              )
    QUOTE(      int fd;                                               )
    QUOTE(      Charm_Setup;                                          )
    QUOTE(      fd = open("_charm_lseek", O_WRONLY | O_CREAT, 0666);  )
    QUOTE(      if (fd == -1) { exit(-1); }                           )
    QUOTE(      %s(stdout, 0, SEEK_SET);)                             )
    QUOTE(      printf("%%d", 1);                                     )
    QUOTE(      if (close(fd)) { exit(-1); }                          )
    QUOTE(      return 0;                                             )
    QUOTE(  }                                                         )

I believe the rules for preprocessor stringification require that leading and
trailing whitespace be discarded, so those lines aren't any longer as far as
the compiler is concerned.  That's what GCC does at least...

    All leading and trailing whitespace in text being stringified is ignored.
    Any sequence of whitespace in the middle of the text is converted to a
    single space in the stringified result. Comments are replaced by
    whitespace long before stringification happens, so they never appear in
    stringified text. 

... but I think it's in the standard, since what's actually happening is the
stringification of a "macro argument", and if I understand correctly macro
argument parsing rules require whitespace collapsing.

There are other implications to parsing as a macro argument before
stringifying.  We can't have unbalenced parens within any quote group.  I'm
also kind of worried about braces/blocks -- GCC and MSVC seem to handle them
OK, but will other compilers?

[... time passes while Marvin researches...]

Rats, I've found a nasty drawback.  :(  

This is a compiler error:

    QUOTE( int i, j; );

The comma causes the contents of the QUOTE macro to be parsed as two

> I'm not sure whether "stringification" is standard or a GCC extension,
> but Google seems to confirm that Visual Studio supports the same
> syntax.

Preprocessor stringification is definitely C standard -- C89, as a matter of
fact.  That's important for Charmonizer, which is supposed to run on any
C89-compliant compiler.  The Lucy core has more esoteric prerequisites -- e.g.
some sort of memory mapping support -- but Charmonizer really ought to run on
any lowest common denominator C89 system.

> You can even do very clear multiline stuff if you aren't worried about
> losing the newlines:
>     QUOTE(
>         int main() {
>             int fd;
>             Charm_Setup;
>             fd = open("_charm_lseek", O_WRONLY | O_CREAT, 0666);
>             if (fd == -1) { exit(-1); }
>             %s(stdout, 0, SEEK_SET);
>             printf("%%d", 1);
>             if (close(fd)) { exit(-1); }
>             return 0;
>         }
>       );

This is nice and clear, certainly.  Newline preservation would be nice, but it
isn't important -- these test-compile snippets are all small enough that
troubleshooting isn't going to be a problem.

It's funny, but MSVC can actually handle the style above, while GCC has a quirk:
it seems that the argument to the QUOTE macro has to start on the same line,
or you need a continuation backslash:


    char *foo = QUOTE(stuff);


    char *foo = QUOTE(


    char *foo = QUOTE(\

> There are some oddities with regard to comments getting dropped, but these
> can be manually quoted.  

Comments aren't important.  None of the snippets are long enough to need them.

> There's also an oddity with using directives like "#include" as a multiline
> argument, but one can work around this too (manual or single line QUOTE).  

It looks like we can embed the newline within the quoted material.

  #include <stdio.h>

  #define QUOTE(x) #x "\n"

  const char code[] = QUOTE(\
    #include <stdio.h>\n
    int main() {
        printf("Greetings, Earthlings!\n");
        return 0;

  int main() {
      printf("%s", code);
      return 0;

That compiles and runs fine under maximally strict GCC:

$ gcc -ansi -pedantic -Wall -Wextra -std=c89 preprocessor_quote.c 
$ ./a.out 
#include <stdio.h>
 int main() { printf("Greetings, Earthlings!\n"); return 0; }

> In my opinion it's worth the small hassle for the greater simplicity.

Excellent.  I love destroying complicated code.

I think our best bet is to go with real string literals and manual escaping.
IMO, there are too many limitations, quirks and gotchas to make preprocessor
stringification worthwhile.  

Even if we accepted the burden of coding to the known limitations, we'll
really be pushing the boundaries of macro argument stringification.  I'm
concerned that we'll encounter corner case bugs in additional compilers.
Charmonizer might be potentially be used for *discovering* such limitations,
but it shouldn't fail to build and run because of them.

Are we on the same page?

Marvin Humphrey

View raw message