incubator-jspwiki-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harry Metske <harry.met...@gmail.com>
Subject Re: 2 last failing unit tests
Date Thu, 05 Nov 2009 05:49:17 GMT
so that would mean for example :

[MYPAGE] => [ M Y P A G E ]
[IPPhone]   => [ I P Phone]
[mypagE] => [mypag E]

looks a bit odd to me


2009/11/5 Andrew Jaquith <andrew.r.jaquith@gmail.com>

> I'd define it as "an uppercase latter that follows a non-whitespace
> character."
>
> On Wed, Nov 4, 2009 at 2:52 PM, Harry Metske <harry.metske@gmail.com>
> wrote:
> > agreed on the 1) and 2)
> >
> > But how exactly do you define "adding a space before each uppercase
> letter
> > that starts a word" ?
> > How do you find this "uppercase letter that starts a word" in a pagename
> or
> > link ?
> > Can you give a few samples ?
> >
> > /Harry
> >
> > 2009/11/2 Andrew Jaquith <andrew.r.jaquith@gmail.com>
> >
> >> Ok, that makes sense. I can think of cases in English too, like
> >> "averse" (opposed to) and "a verse" (a portion of a song or poem). I
> >> just decided that I didn't care. :)
> >>
> >> But assuming we do care...
> >>
> >> ...what about going the other way: on import, or on page save, or page
> >> lookup, forcibly expanding CamelCasePageNames (and inline page links)
> >> so that they have one space in between the words? That way,
> >> case-insensitive matching with spaces preserved (trimmed to one space)
> >> would work.
> >>
> >> So, the rules would be this:
> >>
> >> (1) When links in pages are parsed, or page names are saved, leading
> >> and trailing spaces will be trimmed, and all whitespace between words
> >> will be replaced with one space character.
> >> (2) Whitespace before and after the space name will be removed.
> >> (3) CamelCase page links or page names will be normalize by adding a
> >> space before each uppercase letter that starts a word
> >> (4) Tests for page name equality are done by applying rules (1) , (2)
> >> and (3) and making a case-insensitive comparison.
> >>
> >> That seems simple enough, no?
> >>
> >> Andrew
> >>
> >> On Mon, Nov 2, 2009 at 2:44 PM, Janne Jalkanen <janne.jalkanen@iki.fi>
> >> wrote:
> >> >> Can you provide some examples where a
> >> >> strip-the-whitespace-and-do-a-case-insensitive-comparison strategy
> >> >> would not work, in Finnish? I'd like to understand this, seriously.
> >> >
> >> > E.g. "maan alle" vs "maanalle". First means "into the ground", the
> >> > next one is "earth bear".
> >> >
> >> > Or "kuusi puuta" vs "kuusipuuta" - "six trees" vs "at a fir" (or "of
> >> > fir timber").
> >> >
> >> > Or simply "sivusta katsoja" vs "sivustakatsoja" - "a person who looks
> >> > (literally) from the sides" vs "onlooker".  The difference is subtler
> >> > than with the previous ones, but the existence of the space is
> >> > significant information.
> >> >
> >> > In fact, getting mixed up when two words go together and when they do
> >> > not is one of the most common grammatical errors.  Sometimes the
> >> > results can be fairly hilarious and unintended.  Often it looks just
> >> > sad.
> >> >
> >> > But the point being that in Finnish (and other so-called constructed
> >> > languages), whitespace is significant.  So it should not be ignored
> >> > arbitrarily.
> >> >
> >> > Besids, I am not aware of any wikiengines who would consider
> >> > whitespace insignificant in determining pagename equality.
>  mediawiki's
> >> > rules concerning spaces are:
> >> >
> >> > <snip>
> >> > Spaces/underscores which are ignored:
> >> > * those at the start and end of a full page name
> >> > * those at the end of a namespace prefix, before the colon
> >> > * those after the colon of the namespace prefix
> >> > * duplicate consecutive spaces
> >> > <snap>
> >> >
> >> >> FYI, I took a look at JSPWiki.org to see what the scale of the
> problem
> >> >> might be. The site has about 4850 pages. I yanked down all of the
> page
> >> >> names and compared them. I detected exactly ONE name clash: "Text
> >> >> formatting rulesKorean" and "TextformattingrulesKorean" appear to be
> >> >> different pages. That is a 0.02% collision rate -- and easily handled
> >> >> by a rename-on-import or special-page redirection strategy.
> >> >
> >> > That's not what I meant.  I meant that we have many links of the form
> >> > [word1 word2] embedded within running text.  If we change those, then
> >> > the running text becomes meaningless and needs to be *checked by
> >> > hand*.
> >> >
> >> > /Janne
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message