incubator-jspwiki-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Janne Jalkanen <>
Subject Re: Metadata in 3.0 [Was: JSPWiki 3 design notes]
Date Tue, 05 Feb 2008 13:08:00 GMT
> Now, before getting into this too deeply it occurs to me that we might
> consider a pluggable meta API rather than single metadata schema. There

Um.  Pluggable?  No.  That'll create loads of problems.

User-access to the metadata?  Absolutely.   And I think that is what
you really mean - ability to add your own arbitrary metadata for any

> WorldCat), Dublin Core is used in almost the entirety of the world's
> libraries for lightweight interchangeable metadata and is compatible
> with and/or the basis of the designs used by the W3C and its "semantic
> web".

Semantic web is actually a load of bollocks.  But it has some nice

> I note that many of the proposed field names come from Atom. While this
> is perhaps an appropriate usage, Atom is a syndication schema, not a
> content repository schema. There's not a huge difference and Atom is in
> large parts (semantically) compatible with and influenced by Dublin Core
> (e.g., choice of atom:creator). For documents stored in a repository I
> believe Dublin Core is likely more appropriate.

There are some reasons why I chose Atom identifiers; I was involved in
its definition (somewhat), and therefore I know some of the reasons
why Atom does not use Dublin Core.  Partly because some of the
definitions were a bit complicated.

> Historically, there are two Dublin Core schemas, DC.* and DCTERMS.*.
> The original core set (about a dozen) of Dublin Core Metadata Elements
> (DC.*) have been grandfathered into the set of DC Terms (DCTERMS, see
> footnote). For our purposes below, we can consider DC.* and DCTERMS.*
> as identical namespaces (they by definition now are).

I wasn't aware of dcterms.  INteresting.

>  * atom:updated As in RFC 4287. This is a DATE.
> Recommendation: Use DCTERMS.modified. [ or]

The semantics of atom:updated and dcterms.modified differ - and I seem
to recall that that difference is minuscule, but actually very
important.  Can't dig up the reference now, will do later.

>  * atom:published As in RFC 4287. As JSPWiki does not yet support
>   "draft" -pages, this is essentially a creation date. NB: This cannot
>    be checked from page version #1, because that might be deleted.
>    This is a DATE.
> Recommendation: Use DCTERMS.created. Agreed: this must be carried
> through all revisions since it provides a canonical container for the
> origin date of the document. [ or]

Probably better.

>  * atom:id As in RFC 4287. This has some advantages, and can easily be
>    tied to the JCR jcr:uuid. This is a STRING.
> Recommendation: Use DCTERMS.identifier. [DC.identifier]
>    STRING (URI?)

Nope.  Atom:id is a very, very useful construct.  As you mentioned in
the last email, you probably want to use dc:identifier for your own

> Recommendation: Use DCTERMS.creator. The Atom specification seems to borrow
> extensively from DC, with atom:author identical with the concept of
> DC.creator (they apparently just didn't like the term 'creator' and
> changed it to 'author'), but do use 'contributor' in the same manner
> (again, paraphrasing the terminology from DC). This will need to occur
> in all revisions since we need to maintain the original author ID
> regardless of the existence of a given revision. [DC.creator]

I seem to recall that dc requires a specific notation for the user
data - which might be incompatible with what we have (essentially the
uid).  It might be useful to provide a pseudo-property dc:creator
which is constructed out of wiki:creator and UserDatabase data.

> Recommendation: Use DCTERMS.contributor.  The idea with DC.creator and
> DC.contributor is that the former is the original creator (author) of
> a resource, and any subsequent contributions (editing, translation, etc.)
> are considered as being done by a 'contributor'. For the original author,
> see wiki:creator (DC.creator) above. [DC.contributor]

I am not certain whether dc:contributor is syntactically okay.

> Recommendation: Use new application profile wiki:content. Question
> as to binary stream? Not STRING?

JPEGs are badly presented as Strings.

> Recommendation: Use DCTERMS.format. This is the term used to contain
> a format identifier.  While I recognise that these discussions tend to

I would need to check if it's okay.

> devolve rather quickly, I would highly recommend considering the MIME
> or Internet Media Type as "application/*" instead of "text/*", e.g.,
> "application/x-wiki+jspwiki". In looking at the history of "text/html"
> vs. "application/html" this would suggest that text formats that use
> a significant amount of processing to perform rendering generally move
> towards being considered more an application than a text format (i.e.,
> that while they may be largely human readable they quickly become
> indecipherable or largely unreadable in practice when used with plugins
> and other complex syntax, e.g., many if not most pages on Wikipedia.
> [DC.format] STRING.

I don't really know about this.  I don't care much.

> Recommended: Use new application profile wiki:state. Enumerated value
> set. Not labeled as BOOLEAN but seems to be.

Yes, boolean.

> In summary, while I see Atom as interesting and in large part semantically
> compatible with Dublin Core, I think it'd be better to incorporate a
> schema that was designed more specifically for resources than for feeds;
> the definitions fit more closely with our usage.

I think we need to define the exact semantics of the properties we
want to use, and then choose what is most appropriate - or define our own.

I'll need to check dcterms, though.


View raw message