incubator-jspwiki-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Janne Jalkanen <>
Subject Re: Metadata in 3.0 [Was: JSPWiki 3 design notes]
Date Tue, 05 Feb 2008 16:40:05 GMT
> I don't think it will. There's a core set of fields but their names
> should probably be abstractions. I'm trying to think through how this
> might work without loads of problems. There's so many applications
> for JSPWiki (in terms of how it might fit into other applications)
> that we'll need to fit into others' metadata schemes. What I'm
> talking about are really surface names for things.

Yes, it will.  If the provider has to figure out mapping between  
different concepts in the database, it'll create problems.

This is exactly why namespaces were invented, and this is also why it  
would probably be a better idea NOT to reuse Dublin Core, but to  
stick to our own schema.

> Well, yes, but also having the field names match a given schema. Maybe
> some kind of transformation feature, dunno.

I think namespaces are quite enough for us.  I don't really want to  
code for the case in case someone wants to use "wiki:author" for some  
other purpose.

If people want, they *can* rewrite their own backend in such a way  
that in converts everything into paper notes stuck onto a donkey  
glued to a wall somewhere in Pakistan with the word "CUCKOO" written  
on the backside - but after the JCR interface, I don't really care  
what transformations you do.

>>> Well, I also mentioned that I really doubt that I'd be using  
>>> dc:identifier
> for those purposes within the JSPWiki metadata profile. I can also see
> creating a suitable ID within our own namespace, but I really think
> dc:identifier would suit fine. We'd not be abusing it at all.

Ah yes, now I found it.  From RFC 5013:

"Element Name:   identifier

    Label:       Identifier
    Definition:  An unambiguous reference to the resource within a given
    Comment:     Recommended best practice is to identify the
                 resource by means of a string conforming
                 to a formal identification system."

Whereas from RFC 4287 (Atom)

"Its content MUST be an IRI, as defined by [RFC3987].  Note that the
    definition of "IRI" excludes relative references.  Though the IRI
    might use a dereferencable scheme, Atom Processors MUST NOT  
assume it
    can be dereferenced.

    When an Atom Document is relocated, migrated, syndicated,
    republished, exported, or imported, the content of its atom:id
    element MUST NOT change.  Put another way, an atom:id element
    pertains to all instantiations of a particular Atom entry or feed;
    revisions retain the same content in their atom:id elements.  It is
    suggested that the atom:id element be stored along with the
    associated resource.

    The content of an atom:id element MUST be created in a way that
    assures uniqueness.

    Because of the risk of confusion between IRIs that would be
    equivalent if they were mapped to URIs and dereferenced, the
    following normalization strategy SHOULD be applied when generating
    atom:id elements:

    o  Provide the scheme in lowercase characters.
    o  Provide the host, if any, in lowercase characters.
    o  Only perform percent-encoding where it is essential.
    o  Use uppercase A through F characters when percent-encoding.
    o  Prevent dot-segments from appearing in paths.
    o  For schemes that define a default authority, use an empty
       authority if the default is desired.
    o  For schemes that define an empty path to be equivalent to a path
       of "/", use "/".
    o  For schemes that define a port, use an empty port if the default
       is desired.
    o  Preserve empty fragment identifiers and queries.
    o  Ensure that all components of the IRI are appropriately character
       normalized, e.g., by using NFC or NFKC.  Comparing atom:id

    Instances of atom:id elements can be compared to determine  
whether an
    entry or feed is the same as one seen before.  Processors MUST
    compare atom:id elements on a character-by-character basis (in a
    case-sensitive fashion).  Comparison operations MUST be based solely
    on the IRI character strings and MUST NOT rely on dereferencing the
    IRIs or URIs mapped from them.

    As a result, two IRIs that resolve to the same resource but are not
    character-for-character identical will be considered different for
    the purposes of identifier comparison.

    For example, these are four distinct identifiers, despite the fact
    that they differ only in case:

    Likewise, these are three distinct identifiers, because IRI
    %-escaping is significant for the purposes of comparison:"


I like atom:id much more than the dc:identifier, because
a) atom:id conforms to very precise semantics, including comparison  
rules (which dc:identifier does not give)
b) atom:id is defined as globally unique and non-dereferenceable  
(which helps a *lot* when you don't get people assuming that there's  
something at the end of your IRI)
c) atom:id is defined as an IRI instead of an URI (small difference,  
but might be important)
d) atom:id is defined as unique across the entire lifespan of the  
entity, which dc:identifier is not.
e) Atom feeds make a lot of sense to use, even in a wiki context (and  
you need the atom:id anyway)

Since atom:id is a machine-processable entity, having clear, machine- 
understandable rules as to what it really is, is very, very  
important.  For dc:identifier, it's pretty much handwaving.

> Not that I'm aware of. DC doesn't get into that kind of thing much
> except when you get to things like dates.

I would actually like to use the atom:person construct here, since it  
has better semantics (it adds an IRI to a name, which can be useful  
in figuring out across wikis who actually authored what).  But it  
might be easier to just to store a local identifier, in which case dc  
is as good as any.

> It certainly suits the role of both dc:creator, editor, translator,
> etc. (i.e., very general purpose), anyone who contributes to the
> resource.

But again, the definition is a bit handwavy.

>>> Recommendation: Use DCTERMS.format. This is the term used to contain
>>> a format identifier.  While I recognise that these discussions  
>>> tend to
>> I would need to check if it's okay.
> That one is pretty common.

Unfortunately, it just says that the "best practice" is to use  
something like MIME.  Now the problem is that in order to consider  
e.g. data portability, there's no way to say that "this  
dcterms:format" means a MIME type.  So again, a system processing the  
information needs to resort to context-sensitive processing (e.g.  
"ok, so this comes from jspwiki, so it's always a MIME type").     
Which isn't really very good.  This is why I would like to have an  
unambigous "wiki:contentType" definition, which can also be reflected  
in a non-modifiable pseudoproperty "dcterms:format".

E.g. "wiki:contentType contains a STRING, which denotes the MIME  
content type of the content as defined in RFC XXXX [MIME]."

For example, if it's just defined as a String, how do you define  
equivalence rules?  Is it okay to put in IMAGE/JPG, or ImAgE/jpG, or  
image/jpg? If you do not know that these are MIME types, and RFC XXXX  
defines MIME comparison as case-insensitive, then your application  
might be functioning wrong.

This is really my gripe with Dublin Core - it leaves too much up for  
interpretation.  Which makes it really good for people, but  
cumbersome for computers.

> It's a Big Deal for a lot of people, I probably don't care much  
> either.
> I use 'text/wiki' for general purpose wiki text and the application
> one above to specifically tag JSPWiki wiki text.

I don't think you can use text/wiki - it's missing the "x-" ;-)

It might be interesting to just adopt the practice other wikiengines  
are using.


View raw message