cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [proposal] Cocoon documentation system
Date Mon, 17 Jan 2005 05:32:54 GMT
Sylvain Wallez wrote:

>> Ok, captchas + human moderation is clearly too high of a barrier for 
>> spammers and even for defacers. Even infra@ would not have a problem 
>> with that.
> 
> There's an interesting chapter on circumventing captchas at wikipedia 
> [1]. Are we "interesting enough" in terms of google ranking to attract 
> such things?

Yeah! Apache is probably in top 1000 web sites in google. I think we are 
definately a target!

Captchas are not impossible to break, of cource, but they are *hard 
enough* and provide a first filtering barrier.

A second filtering barrier could be heuristical analysis of the comment.

But distributed human moderation is clearly a barrier that no spammer 
will be able to pass, unless the load becomes dramatic and at that 
point, blacklisting is the way to go.

>>> Two question to Stefano (and everybody else): I proposed numbers as 
>>> document IDs? What do you think about this?
>>
>> I used to be a fanatic of 'readable URL'... but I think they present 
>> more problems than they solve.
>>
>> First of all, the encoding is a pain. It's fine for english, but until 
>> we ave IRI (internationalized resource identifiers, think "unicode 
>> meets URI") support forget chinese, japanese, cyrillic, hebrew, korean 
>> and so on.
>>
>> One normaly solution is to have an english title even for non-english 
>> pages. I dislike that, it's very anglo-centric.
> 
> Well, consider the state of Cocoon, the ASF, the opensource world and 
> the whole IT industry: they're all anglo-centric. Would you have the 
> same concerns if this was esperanto or interlingua rather than english 
> (or more precisely "international english")?
> 
> Furthermore, translations must follow the original reference docs, which 
> is the english one. So having all language-specific resources use the 
> same name as their english counterpart isn't a problem to me.

fair enough and coming from a french is rather something ;-)

>> Second, people like Nielsen argue that readable URLs are easier to use 
>> and to remember. I think it's bullshit. Not even my bookmarks satisfy 
>> me anymore in terms of link management (del.icio.us + google killed my 
>> browser bookmarks), do you really think I would type in or remember 
>> any URL today? nonsense.
> 
> I do remember a lot of URLs, provided of course that they are 
> meaningful. And I have a very powerful tool to help me crawling in this 
> tree of URLs that I know: the Firefox address bar autocompletion (which 
> BTW just a reuse of the unix command-line behaviour).
> 
> And the more you use a URL, the more it engraves into your mind. Nothing 
> new in the cognition area here, but that means that a lot of regular 
> users of Cocoon know the URLs space of its documentation by heart, or at 
> least the main directory names.

here we are different. I use google even to get to the cocoon web page 
because typing "cocoon" and return is easier than 
http://cocoon.apache.org/ cocoon ... actually, I do [ctrl-space] co now 
that I have my bookmarks (both local and delicious) managed by Quicksilver.

Readability of URL is just not much for me.

>> There are a few values of a readable URL. The first is actionable 
>> breadcrumbs.
> 
> 
> 
> Breadcrumbs should better be generated from the navigational structure 
> rather than the page path, even if both often match.

I agree here completely.

>> So, if you find yourself in
>>
>>   http://site.com/a/b/c/d/e/f/g
>>
>> you can automatically infer something like
>>
>>   site.com > a > b > c > d > e > f > g
>>
>> and, for shitty web sites, that is a *tremendous* navigation help. For 
>> URLs like
>>
>>   http://site.com/page/39884984
>>
>> that's it, there is no hierarchical context that you can infer from it.
>>
>> Now, we will not have a shitty web site, so this argument doesn't 
>> apply and Amazon (which is the most used e-commerce site in the world 
>> *and* has the worst URL space ever imagined!) shows that URL-space 
>> design does not impact usability, if the pages don't require so.
> 
> Yeah, but Amazon is a large catalogue of things, not a documentation 
> covering lots of different subjects from introduction to details.

True enough.

But hypermedia allows a page to reside in more than one "trail of 
reading", while a hierarchical navigation imposes a TOC-like view, which 
might satisfy (and feel natural to one user) but look ugly and totally 
unfamiliar to others.

I think it's the "cataloguing" part that makes writing documentation so 
hard and that's why things like wikipedia are taking off so much instead.

I personally think that the problem with documentation is that there are 
two concerns:

  - writers
  - assemblers

blogs, email, wikis, all share a common paradigm: you don't need to 
'assemble' your thoughts, you just dump them. Other people do the assembly.

If you wish, this is the beauty of microcontent: massive parallelization 
(and the reason why the web bloomed, because it removed the 
"editing/cataloging" bottleneck.

but the problem was that searching for stuff used to be a nightmare (see 
early days of altavista). This "mare magnum" of content with no apparent 
structure made people "get lost" very easily.

This is the same feeling you have in a wiki. You have a trail of the 
pages that you have visited, but that's useless (you have it in your 
browser too!), you want to be able to "browse" the content, go from this 
content to something that is relevant to you.

In a book, this "relevance" was done by the author (or the editors) and 
was placed in sequential order. Or, if not, clustered in chapters or 
sections.

What a wiki misses (even the good ones like Confluence) is such 
"clustering" notion... something that is easy to achieve with more 
structured system, like forrest by mean of tabs or trees of links.

The problem with this approach is that there is only one way of 
clustering: repurposing pages becomes hell (and that's why there are so 
many broken links.. because the clustering evolves not only with the 
content of the page, but with the surroundings).

By separating the contept of writers and assemblers, not only you 
unleash a tremendous effort in content production (as our wiki 
showed)... but you allow this content to be "clusterized" and, hear 
hear, *in parallel*!

"Conditio sine qua non" of the above is a flat URL space.

Numeric? no, not necessarely, but flat for sure.

>> Actually, since geeks are used to hack into URLs but normal people do 
>> not, having a flat or bad URL space forces usability people to think 
>> about navigation in the page and not outside.
> 
> How much I dislike such sites that require me to go from the main page 
> to go down to a particular page that I've already seen...

Sure, but that's a usability problem of the site, not of the URL space 
"per se".

>> Another argument, and probably more important, is that a flat URL 
>> structure gives a sense of 'wikiness' that people have come to dislike.
>>
>> Now, again, this is a false impression (inspired by a plethora of bad 
>> practices rather than effectual technological limitations) but a 
>> strong one nevertheless (I do feel the same about it at times).
>>
>> But *exactly* because of that, I think we should be brave and show the 
>> world that a flat URL space *does not* automatically yield 'wiki-like' 
>> flat spaces that are extremely painful to navigate.
>>
>> Flat numeric URL spaces have also extremely interesting advantages:
>>
>>  - pages can have their titles adjusted without impacting persistance 
>> (links are more solid over time)
> 
> Adjusting a title doesn't mean you change its content, in which case 
> there's no need to change its name. And if it's content changes, then 
> it's a different page with a different name.

Fair enough.

>>  - pages can be rearranged/repurposed/re-aggregated/re-used without 
>> impacting persistance
> 
> Agree for "rearranged" as a flat space allows to change the navigation 
> tree without impacting path names. Now repurposing a page requires to 
> change its name (or id) and re-aggregating means removing (aggregation) 
> or adding (split) some pages.

Yep.

>>> Another question is the structure of URLs - the new efforts of 
>>> Sylvain who wants to provide some docs in French needs some thinking 
>>> where to put them.
>>
>>
> 
> Wait, wait! I haven't proposed to translate the docs!! This is a 
> tremendous and effort! I proposed to just translate the introductory 
> page to accompany the french-speaking mailing-list.

eheh, sure, but Reinhard did a good thing in bringing this up.

>>> I propose
>>>
>>> http://c.a.o/  ............... editable global docs (own repository)
>>> http://c.a.o/fr/ ............. editable global docs in French (own 
>>> repository)
>>> http://c.a.o/2.2/ ............ editable docs of 2.2 (own repository)
>>> http://c.a.o/2.2/fr/ ......... editable docs of 2.2 in French (own 
>>> repository)
>>> http://c.a.o/2.2.1/ .......... "frozen" docs of the 2.2.1 release
>>> http://c.a.o/2.2.1/en/ ....... "frozen" French docs of the 2.2.1 release
>>
>>
>>
>> I don't think we should have frozen docs at any time, they are 
>> included in the distributions anyway and those distributions will be 
>> persisted for the longest time.
>>
>> Sun did this with the Java API did this and created a mess, people 
>> linked to java/1.4.2/ and then 1.4.3 was created and all links broke 
>> down.
>>
>> If a document shipped in 2.1.3 has a bug and was fixed in 2.1.4, why 
>> would anybody want to see it? and if 2.1.4 removed something useful 
>> for 2.1.3, that's a bug and we should fix it in the doc, rather than 
>> make everything available on the web.
>>
>> So I'm -1 on this.
> 
> Agree. We may want to keep around the docs for each major release (i.e. 
> 2.0, 2.1, 2.2) as Tomcat does, but certainly not the docs for minor 
> releases (i.e. 2.2 and 2.2.1).

Cool.

>> As for french docs, I *strongly* think that we should do this thru 
>> content-negotiation rather than URL design. A person accessing the 
>> page with a french browser will get the page in french, that's all 
>> they have to know (and the page will have a series of flags that will 
>> trigger an overload in locale, but that's going to be a parameter of 
>> the URL, not part of it).
>>
>> The language a page is written, just like the data-type of the page, 
>> should not belong in the URL.
>>
>> This makes the URL space way more "solid" overtime: I can link to
>>
>>  http://cocoon.apache.org/2.2/3984948
>>
>> and *be sure* that it will be there a few years from now and, by then, 
>> maybe a translation in my native language would have poped up!
> 
> And why shouldn't e.g. 
> http://cocoon.apache.org/2.1/userdocs/flow/continuations.html not be there?

example: because somebody decided to split the user section by "concern 
area" and so continuation now belongs to

  http://cocoon.apache.org/2.1/users/programmer/flow/continuations.html

but not everybody thought that this was a good idea, so we have a 
redirect from the old URI to the new one... but down the road, somebody 
from the Lisp world come along and shows how the term "continuation" is 
actually misleading and he convinces to change this to 
"webcontinuations" so that we now have a redirect from the old URL to 
the newer one to the newest one.

But it's true that persistenace of a URL is a property of those 
administering it not of the URL itself.

>> let's be brave!
> 
> Let's be brave and dive into a fog of meaningless URLs? I'm not 
> convinced...

I showed why I want a flat URL space.

Now, I could be convinced to change from

  http://cocoon.apache.org/2.2/3940834

to

  http://cocoon.apache.org/2.2/continuations

but only if we mandate that titles cannot contain '/'. This will force 
people to test for URL naming collisions and will carve a 
anglophone-centric view of our system (for now and forever!), but I can 
live with that.

thoughts?

-- 
Stefano.


Mime
View raw message