Mailing-List: contact tomcat-dev-help@jakarta.apache.org; run by ezmlm
Date: Sat, 3 Jun 2000 09:45:34 -0700
From: Alex Chaffee <guru@edamame.stinky.com>
To: tomcat-dev@jakarta.apache.org
Cc: Ken Flurchick <kenf@osc.edu>, Armen Ezekielian <abe@osc.edu>,
  haupt@erc.msstate.edu, Jan Labanowski <jkl@osc.edu>
Subject: Re: Tomcat bug
Message-ID: <20000603094534.D3663@edamame.stinky.com>
Reply-To: alex@jguru.com
References: <Pine.LNX.4.21.0006031625210.616-100000@ambient.collab.net>
 <Pine.LNX.4.21.0006031644240.616-100000@ambient.collab.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <Pine.LNX.4.21.0006031644240.616-100000@ambient.collab.net>;
 from ed@apache.org on Sat, Jun 03, 2000 at 04:47:14PM -0700

On Sat, Jun 03, 2000 at 04:47:14PM -0700, Ed Korthof wrote:
> On Sat, 3 Jun 2000, Ed Korthof wrote:
> 
> > This is not a valid Java statement:
> > 
> > 	char c = '\u000d';
> > 
> > because the '\u000d' is not a valid character constant.
> 
> The Java Language Spec is very helpful at times like this.  Here's what it
> has to say about this particular value:
> 
> 	Because Unicode escapes are processed very early, it is not
> 	correct to write '\u000a' for a character literal whose value is
> 	linefeed (LF); the Unicode escape \u000a is transformed into an
>         actual linefeed in translation step 1 (3.3) and the linefeed
>         becomes a LineTerminator in step 2 (3.4), and so the character
>         literal is not valid in step 3. Instead, one should use the escape
>         sequence '\n' (3.10.6). Similarly, it is not correct to write
>         '\u000d' for a character literal whose value is carriage return
>         (CR). Instead, use '\r'.
> 
> This is out of http://java.sun.com/docs/books/jls/html/3.doc.html#100960
> ... IMO, it is kinda lame (why special case these two characters?), but
> that's how the spec is written.

They're *not* special cased -- that's the problem :-)

The value specified by \uXXXX is literally placed into the parse
stream as that character. So...

   char c = '\u000a';

becomes

   char c = '
';


Now you see why that doesn't parse correctly?

 - Alex