lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: character escapes in source? ... was: Re: Eclipse: Invalid character constant
Date Fri, 08 Apr 2011 06:49:36 GMT
On Fri, Apr 8, 2011 at 03:01, Robert Muir <rcmuir@gmail.com> wrote:
> On Thu, Apr 7, 2011 at 6:48 PM, Chris Hostetter
> <hossman_lucene@fucit.org> wrote:
>>
>> : -1. These files should be readable, for maintaining, debugging and
>> : knowing whats going on.
>>
>> Readability is my main concern ... i don't know (and frequently can't
>> tell) the differnece between a lot of non ascii characters -- and i'm
>> guessing i'm not alone.  when it's spelled out explicitly using the
>> character name or escape code, there is no ambiquity about what character
>> was intended, or wether it got screwed up by some tool along the way (ie:
>> the svn server, an svn client, the patch command, a text editor, an IDE,
>> ant's "fixcrlf" task, etc...)
>
> Please take the time, just 5 or 10 minutes, to look thru some of this
> source code and tests.
>
> Imagine if you couldn't just look at the code to see what it does, but
> had to decode from some crazy numeric encoding scheme.
> Imagine if it were this way for things like stopword lists too.
>
> It would be basically impossible for you to look at the code and
> figure out what it does!
> For example, try looking at thai analyzer tests, if these were all
> numbers, how would you know wtf is going on?
>
> Although this comes up from time to time, I stand firm on my -1
> because its important to me for the source code to be readable.
> I'm not willing to give this up just because some people cannot read
> writing system XYZ.
>
> I have said before, i'm willing to change my -1 vote on this, if *ALL*
> string constants (including english ones) are changed to be character
> escapes.
> If you imagine what the code would look like if english string
> constants were instead codes, then I think you will understand my
> point of view!
>
> Its really really important to source code readability to be able to
> open a file and understand what it does, not to have to use some
> decoder because it uses characters other people dont understand.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

I think having both raw characters /and/ encoded representation is the
best? (one of them in comments)
I'm all for unicode sources, but at least two things hit me repeatedly:
1. Tools do screw up, and you have to recover somehow.
eg. IntelliJ IDEA's 'shelve' function uses platform default (MacRoman
in my case) and I've lost some text on things I shelved but never
committed anywhere.
2. There are characters that look all the same.
E.g. different whitespace/dashes. Or, (if you have cyrillic in your
fonts) I dare you to discern between a/а, c/с, e/е, o/о.
These are different characters from latin and cyrillic charsets (left
latin/right cyrillic), but in 99% fonts they are visually identical.
I had a filter that folded up similarily looking characters, and it
was documented in exactly this way - raw char+code.

-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: earwin@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message