lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: character escapes in source? ... was: Re: Eclipse: Invalid character constant
Date Thu, 07 Apr 2011 20:37:39 GMT
+1

I took an all-of-the-above approach, including the Unicode character description, for the
ASCIIFoldingFilter-based stuff.  E.g. from the mapping file <http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/mapping-FoldToASCII.txt?view=markup>:

	# Ä [LATIN CAPITAL LETTER A WITH DIAERESIS]
	"\u00C4" => "A"

Steve

> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> Sent: Thursday, April 07, 2011 4:28 PM
> To: Lucene Dev
> Subject: character escapes in source? ... was: Re: Eclipse: Invalid
> character constant
> 
> 
> replying to dev...
> 
> : in eclipse you need to set your project's character encoding to UTF-8.
> 	...
> : > Some language specific classes like GermanLightStemmer has invalid
> : > character
> : > compiler errors for code like:
> : >      switch(s[i]) {
> : >        case 'ä':
> : >        case 'Ã ':
> : >        case 'á':
> : > in Eclipse with JDK 1.6
> 
> ...i seem to remember something similar coming up in the past, and I
> thought we decided we should use java unicode character escapes instead of
> literal UTF-8 characters in the source to minimize the number of headaches
> (and make it more self documenting *exactly* what character we were using.
> 
> should we revisit this?
> 
> 
> -Hoss
Mime
View raw message