incubator-jspwiki-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Jaquith <andrew.r.jaqu...@gmail.com>
Subject Re: 2 last failing unit tests
Date Thu, 05 Nov 2009 02:49:29 GMT
I'd define it as "an uppercase latter that follows a non-whitespace character."

On Wed, Nov 4, 2009 at 2:52 PM, Harry Metske <harry.metske@gmail.com> wrote:
> agreed on the 1) and 2)
>
> But how exactly do you define "adding a space before each uppercase letter
> that starts a word" ?
> How do you find this "uppercase letter that starts a word" in a pagename or
> link ?
> Can you give a few samples ?
>
> /Harry
>
> 2009/11/2 Andrew Jaquith <andrew.r.jaquith@gmail.com>
>
>> Ok, that makes sense. I can think of cases in English too, like
>> "averse" (opposed to) and "a verse" (a portion of a song or poem). I
>> just decided that I didn't care. :)
>>
>> But assuming we do care...
>>
>> ...what about going the other way: on import, or on page save, or page
>> lookup, forcibly expanding CamelCasePageNames (and inline page links)
>> so that they have one space in between the words? That way,
>> case-insensitive matching with spaces preserved (trimmed to one space)
>> would work.
>>
>> So, the rules would be this:
>>
>> (1) When links in pages are parsed, or page names are saved, leading
>> and trailing spaces will be trimmed, and all whitespace between words
>> will be replaced with one space character.
>> (2) Whitespace before and after the space name will be removed.
>> (3) CamelCase page links or page names will be normalize by adding a
>> space before each uppercase letter that starts a word
>> (4) Tests for page name equality are done by applying rules (1) , (2)
>> and (3) and making a case-insensitive comparison.
>>
>> That seems simple enough, no?
>>
>> Andrew
>>
>> On Mon, Nov 2, 2009 at 2:44 PM, Janne Jalkanen <janne.jalkanen@iki.fi>
>> wrote:
>> >> Can you provide some examples where a
>> >> strip-the-whitespace-and-do-a-case-insensitive-comparison strategy
>> >> would not work, in Finnish? I'd like to understand this, seriously.
>> >
>> > E.g. "maan alle" vs "maanalle". First means "into the ground", the
>> > next one is "earth bear".
>> >
>> > Or "kuusi puuta" vs "kuusipuuta" - "six trees" vs "at a fir" (or "of
>> > fir timber").
>> >
>> > Or simply "sivusta katsoja" vs "sivustakatsoja" - "a person who looks
>> > (literally) from the sides" vs "onlooker".  The difference is subtler
>> > than with the previous ones, but the existence of the space is
>> > significant information.
>> >
>> > In fact, getting mixed up when two words go together and when they do
>> > not is one of the most common grammatical errors.  Sometimes the
>> > results can be fairly hilarious and unintended.  Often it looks just
>> > sad.
>> >
>> > But the point being that in Finnish (and other so-called constructed
>> > languages), whitespace is significant.  So it should not be ignored
>> > arbitrarily.
>> >
>> > Besids, I am not aware of any wikiengines who would consider
>> > whitespace insignificant in determining pagename equality.  mediawiki's
>> > rules concerning spaces are:
>> >
>> > <snip>
>> > Spaces/underscores which are ignored:
>> > * those at the start and end of a full page name
>> > * those at the end of a namespace prefix, before the colon
>> > * those after the colon of the namespace prefix
>> > * duplicate consecutive spaces
>> > <snap>
>> >
>> >> FYI, I took a look at JSPWiki.org to see what the scale of the problem
>> >> might be. The site has about 4850 pages. I yanked down all of the page
>> >> names and compared them. I detected exactly ONE name clash: "Text
>> >> formatting rulesKorean" and "TextformattingrulesKorean" appear to be
>> >> different pages. That is a 0.02% collision rate -- and easily handled
>> >> by a rename-on-import or special-page redirection strategy.
>> >
>> > That's not what I meant.  I meant that we have many links of the form
>> > [word1 word2] embedded within running text.  If we change those, then
>> > the running text becomes meaningless and needs to be *checked by
>> > hand*.
>> >
>> > /Janne
>> >
>>
>

Mime
View raw message