httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Basant Kukreja <Basant.Kukr...@Sun.COM>
Subject Re: svn commit: r691418 [2/2] - in /httpd/httpd/trunk: ./ docs/manual/mod/ modules/filters/
Date Wed, 10 Sep 2008 06:55:37 GMT
On Tue, Sep 09, 2008 at 10:19:47PM -0700, Basant Kukreja wrote:
> >
> > What are the above constants supposed to be. Opening the file in vi shows that they
are special
> > characters or better control characters. Looking with a hex editor these () seem
to be \\08.
> > Is this correct?
> 
> Sed has a "l" command. From the sed man page :
>      (2)l            List the pattern space on the standard  out-
>                      put  in  an  unambiguous  form. Non-printing
>                      characters are spelled in  two  digit  ASCII
>                      and long lines are folded.
> 
> From the code :
>                    p3 = trans[(unsigned char)*p1-1];
>                     while ((*p2++ = *p3++) != 0)
>                         if(p2 >= eval->lcomend) {
>                             *p2 = '\\';
>                             wline(eval, eval->genbuf, strlen(eval->genbuf));
>                             p2 = eval->genbuf;
>                         }
> 
> 
> It looks to me that it is trying to print character from value 0 to 31 as
> printable characters.
> 
> >> +    "-<",
> >> +    "->",
> It seems to me that it should be \\08 and \\09.
> 
> I will dig deeper to see if these can be simplified. It looks little weird that
> there are binary characters in source file.
> 
> Regards,
> Basant.
> 
I investigated further. I wrote a test file having binary character from 0 to 31:
$ od -c out.txt
0000000  \0  \n 001  \n 002  \n 003  \n 004  \n 005  \n 006  \n 007  \n
0000020  \b  \n  \t  \n  \n  \n 013  \n  \f  \n  \r  \n 016  \n 017  \n
...

And a small sed script :
 $ cat one.sed
l
d

Sed script just runs the "l" command for each line.
$ /usr/ucb/sed -f one.sed out.txt  > out1.txt

Here is the output of out1.txt
$ od -c out1.txt
0000000  \n   \   0   1  \n   \   0   2  \n   \   0   3  \n   \   0   4
0000020  \n   \   0   5  \n   \   0   6  \n   \   0   7  \n   -  \b   <
0000040  \n   -  \b   >  \n  \n  \n   \   1   3  \n   \   1   4  \n   \
0000060   1   5  \n   \   1   6  \n   \   1   7  \n   \   2   0  \n   \
...

$ cat out1.txt
\01
\02
\03
\04
\05
\06
\07
<
>


\13
\14

-------------------------------------------

So for some strange reason :
0x8 is converted to "-\b<" and
0x9 is converted to "-\b>"

That's what we see in "trans" variable.

Do you think it could be a bug in original sed and should we correct it? 

It should probably print "\10" and "\11".

BTW /usr/bin/sed have the exactly the same behavior. 

It sound strange though that this was never caught in sed code.

Regards,
Basant.

Note :
GNU sed (gsed) have a different behavior. GNU sed changes it to "\b$" and "\t$".
$ sed -f one.sed out.txt
\000$
\001$
\002$
\003$
\004$
\005$
\006$
\a$
\b$
\t$
$
$
\v$
\f$
\r$
\016$
\017$
\020$

Mime
View raw message