incubator-jspwiki-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Murray Altheim <murra...@altheim.com>
Subject Re: Page types
Date Mon, 07 Jul 2008 11:04:44 GMT
 > Christophe Dupriez wrote:
 >> Hi all!
 >>
 >> Thanks a lot for the healthy discussion and advices: I will let you
 >> know what I will have done finally (and practically!)
 >>
 >> Thanks Murray for the reference to "Balisage" conference in Montreal:
 >> http://www.balisage.net/At-A-Glance.html

You're quite welcome -- glad to hear of your interest. Balisage is
the new name for what used to be called Extreme Markup, which has
been going on for many years each August, and represents many of
the world's markup experts' latest work.

 >> To come back to the "tagging" concept, personnaly I see two different
 >> situations:
 >> * the classification of documents (the document "type") which determine
 >>   its lifecycle (workflow), its access rules, its layout, etc.
 >>   Each document having one class (type), possibly in a hierarchy of
 >> classes.

What one might call "administrative metadata".

 >> * the indexation of documents ("keywords") which indicates its topics,
 >> processing status, linked places, epochs, people, organizations, etc.

What one might call "classification metadata", though processing status
is more properly in your first category of administrative.

 >> I see more the tags has parts of the indexation system than the
 >> classification. But a discussion on this may just turn like a discussion
 >> about typing versus untyping programming languages!!!

I actually (as do librarians) differentiate between tags and facets. Tags
are generally not from a controlled vocabulary, are generally added and
used by users (rather than by librarians, publishers, administrators, etc.),
whereas facets are a bit like tags but usually are from controlled
vocabularies and themselves often fit into a hierarchy or ontology (the
latter term I'm using colloquially to mean a graph structure of hierarchical
as well as other relation types).

I don't tend to mix this in with typing, but then again I am in a library
environment where classification and categorization are the norm.

Typing implies not classification but membership in a class, where "class"
has some kind of formal definition, often set theoretic, collection-based,
or some other form of mathetically-precise formalism. When we work with
language we need to stop considering it as that formal -- it's not --
which is one of the reasons the "Semantic Web" is such dog's bollocks,
i.e., one is simply using the wrong tool for the job. On the other hand,
librarians have got it right:  using human language as a way to classify
human language, without any hand-waving about it having some kind of
formal mathematical basis.

 >> To support classification also implies:
 >> 1) the Page Renaming is bug less even with uppercase accented letters,
 >> spaces and special characters: the page name embedding the class, it
 >> must be possible to change it without problems even for dictionnaries
 >> with complex names
 >
 >> 2) a solution is found for multilingualism (pages with different names
 >> in different languages but grouped has "equivalent by translation":
 >> references would consider pages in a "translation group" has being the
 >> same)

The problem you're seeming aware of is that any given entity can be (and
indeed, usually is) classified under multiple schemes; each can be
considered as a different context. These different contexts can themselves
be thought of in different contexts, recursively. Language is complex,
and our use of it even more convoluted, especially since we're talking
here about shared language. Computational linguistics used to make a lot
of claims about machine "understanding" that have since been pushed out
many more decades into the future.

So while an entity (say, a book) might contain subjects of World War II,
England, military history, predestination, the Tarot, it could contain
many more depending on what a given culture or audience might find
significant (an open question, to be sure). Therefore there might be
classification metadata pertaining to its subjects (such as contained in
a Dublin Core "Subject" term), as well as publication metadata (such as
publication data (e.g., 1973), Author (e.g., Thomas Pynchon), Publisher
(e.g., "Viking Press" or perhaps a person's name if self-published), and
then all that administrative metadata (sometimes called "accession
information"). And then there's the natural language of authorship
("English") as well as translation ("Arabic"). And a LOT more.

I think between the two of us it's easy to see that this can get very
complicated very quickly, and one of the purposes of a wiki is to try to
provide a system that is both sufficient for one's needs but also as
simple as possible. As engineers we tend to fail in the simplicity
department.

Tags have become popular because of their simplicity, but a big problem
with tags is that they are an incredibly weak tool when used for
classification of resources because they do not come (generally) from a
controlled vocabulary, nor do they (generally) form a structure of their
own, so there's no synonymy, no inheritance, etc.

I've developed a system that forces each tag (as well as each Subject,
Predicate, and Object in an Assertion) to be represented by a wiki page.
I.e., an assertion or tag will fail if the page does not exist: the user
gets an error message that includes a create page link. This is perhaps
a bit more work than simply being able to list a bunch of words, but
this forces documentation, hopefully minimizes spelling errors, and
permits those tags (and Subjects, Predicates, and Objects) to themself
be tagged or otherwise linked into the greater ontology via asserted
relations, the whole wiki becoming a big user-generated ontology. This
is what I'll be presenting on in August.

a bit of wind, the tell-tales have now fallen...

Murray

...........................................................................
Murray Altheim <murray07 at altheim.com>                           ===  = =
http://www.altheim.com/murray/                                     = =  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk               = =  = =

       Boundless wind and moon - the eye within eyes,
       Inexhaustible heaven and earth - the light beyond light,
       The willow dark, the flower bright - ten thousand houses,
       Knock at any door - there's one who will respond.
                                       -- The Blue Cliff Record

Mime
View raw message