incubator-jspwiki-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harry Metske <harry.met...@gmail.com>
Subject Re: 2 last failing unit tests
Date Wed, 04 Nov 2009 19:52:54 GMT
agreed on the 1) and 2)

But how exactly do you define "adding a space before each uppercase letter
that starts a word" ?
How do you find this "uppercase letter that starts a word" in a pagename or
link ?
Can you give a few samples ?

/Harry

2009/11/2 Andrew Jaquith <andrew.r.jaquith@gmail.com>

> Ok, that makes sense. I can think of cases in English too, like
> "averse" (opposed to) and "a verse" (a portion of a song or poem). I
> just decided that I didn't care. :)
>
> But assuming we do care...
>
> ...what about going the other way: on import, or on page save, or page
> lookup, forcibly expanding CamelCasePageNames (and inline page links)
> so that they have one space in between the words? That way,
> case-insensitive matching with spaces preserved (trimmed to one space)
> would work.
>
> So, the rules would be this:
>
> (1) When links in pages are parsed, or page names are saved, leading
> and trailing spaces will be trimmed, and all whitespace between words
> will be replaced with one space character.
> (2) Whitespace before and after the space name will be removed.
> (3) CamelCase page links or page names will be normalize by adding a
> space before each uppercase letter that starts a word
> (4) Tests for page name equality are done by applying rules (1) , (2)
> and (3) and making a case-insensitive comparison.
>
> That seems simple enough, no?
>
> Andrew
>
> On Mon, Nov 2, 2009 at 2:44 PM, Janne Jalkanen <janne.jalkanen@iki.fi>
> wrote:
> >> Can you provide some examples where a
> >> strip-the-whitespace-and-do-a-case-insensitive-comparison strategy
> >> would not work, in Finnish? I'd like to understand this, seriously.
> >
> > E.g. "maan alle" vs "maanalle". First means "into the ground", the
> > next one is "earth bear".
> >
> > Or "kuusi puuta" vs "kuusipuuta" - "six trees" vs "at a fir" (or "of
> > fir timber").
> >
> > Or simply "sivusta katsoja" vs "sivustakatsoja" - "a person who looks
> > (literally) from the sides" vs "onlooker".  The difference is subtler
> > than with the previous ones, but the existence of the space is
> > significant information.
> >
> > In fact, getting mixed up when two words go together and when they do
> > not is one of the most common grammatical errors.  Sometimes the
> > results can be fairly hilarious and unintended.  Often it looks just
> > sad.
> >
> > But the point being that in Finnish (and other so-called constructed
> > languages), whitespace is significant.  So it should not be ignored
> > arbitrarily.
> >
> > Besids, I am not aware of any wikiengines who would consider
> > whitespace insignificant in determining pagename equality.  mediawiki's
> > rules concerning spaces are:
> >
> > <snip>
> > Spaces/underscores which are ignored:
> > * those at the start and end of a full page name
> > * those at the end of a namespace prefix, before the colon
> > * those after the colon of the namespace prefix
> > * duplicate consecutive spaces
> > <snap>
> >
> >> FYI, I took a look at JSPWiki.org to see what the scale of the problem
> >> might be. The site has about 4850 pages. I yanked down all of the page
> >> names and compared them. I detected exactly ONE name clash: "Text
> >> formatting rulesKorean" and "TextformattingrulesKorean" appear to be
> >> different pages. That is a 0.02% collision rate -- and easily handled
> >> by a rename-on-import or special-page redirection strategy.
> >
> > That's not what I meant.  I meant that we have many links of the form
> > [word1 word2] embedded within running text.  If we change those, then
> > the running text becomes meaningless and needs to be *checked by
> > hand*.
> >
> > /Janne
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message