incubator-jspwiki-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Murray Altheim <murra...@altheim.com>
Subject Metadata in 3.0 [Was: JSPWiki 3 design notes]
Date Mon, 04 Feb 2008 22:17:24 GMT
In looking at the 3.0 design document at

    http://www.jspwiki.org/wiki/JSPWiki3Design

I have some comments on the metadata plans. These comments are only
tentative.

----
! Metadata Meta API

Now, before getting into this too deeply it occurs to me that we might
consider a pluggable meta API rather than single metadata schema. There
are likely a variety of different applications that JSPWiki may be
used within (simple wikis, embedded apps, hives, part of document mgmt
systems, etc.), and we likely also want scalability (i.e., in terms of
both simplicity/complexity and factors like page an revision count) in
our metadata just as we do in other areas. I don't think this sounds
particularly difficult if we're using a JSR-170 compliant repository:
there'd be a core set of metadata fields whose actual descriptors would
be assigned by the API implementation. If an application needed more
than that it'd be up to the implementation to define and handle (e.g.,
because the documents will be used within a more complex framework or
document management system having an existing schema). We'd simply be
creating the API and reference implementation.

----
! Recommendation

I agree that the schema for JSPWiki should use standards wherever
possible, and would advocate basing the reference implementation
of a metadata API on Dublin Core, given that it is the predominant
document metadata schema in use on the Web, either used directly or
heavily informed by it). Due to its origins in OCLC (publishers of
WorldCat), Dublin Core is used in almost the entirety of the world's
libraries for lightweight interchangeable metadata and is compatible
with and/or the basis of the designs used by the W3C and its "semantic
web".

When these terms don't suffice there are a variety of ways to extend
the set. An accepted way to do this is to create and publish (i.e.,
post on the web) an "application profile" for the local customisations
made. I am willing to both design and create the necessary documents
for a Dublin Core application profile for JSPWiki. Examples of these
documents (which are backed by an RDF document) are at

    http://dublincore.org/documents/2004/09/10/library-application-profile/
    http://www.natlib.govt.nz/dr/drterms.html
    http://www.natlib.govt.nz/dr/terms# (RDF document)

Note that there is no requirement that an application profile be
either submitted or approved by the DCMI. It's just playing nicely
by the rules to do so. For our purposes it'd just be a published
web page plus a static RDF document.

Below is the name and online comments for each proposed term followed by
my comments and/recommendation for the term to be used in 3.0. I've
sorted the list to begin with those terms that can be supported directly
by the existing Dublin Core terms, followed by a set of terms to be
defined within a JSPWiki application profile. Within the profile would
be references to equivalent terms in other schemas where available and
appropriate.

I note that many of the proposed field names come from Atom. While this
is perhaps an appropriate usage, Atom is a syndication schema, not a
content repository schema. There's not a huge difference and Atom is in
large parts (semantically) compatible with and influenced by Dublin Core
(e.g., choice of atom:creator). For documents stored in a repository I
believe Dublin Core is likely more appropriate.

----
! Historical Note

Historically, there are two Dublin Core schemas, DC.* and DCTERMS.*.
The original core set (about a dozen) of Dublin Core Metadata Elements
(DC.*) have been grandfathered into the set of DC Terms (DCTERMS, see
footnote). For our purposes below, we can consider DC.* and DCTERMS.*
as identical namespaces (they by definition now are).

There used to be a qualification scheme whereby e.g., DC.date could be
qualified as DC.date.modified, but this has been dropped in favour of
having most of these qualified terms become full terms in their own right
within the DCTERMS namespace. Where they exist, I've included the DC.*
term or qualified term in parentheses below.]

--------------

  * atom:updated As in RFC 4287. This is a DATE.

Recommendation: Use DCTERMS.modified. [DC.date or DC.date.modified]
DATE.

  * atom:published As in RFC 4287. As JSPWiki does not yet support
   "draft" -pages, this is essentially a creation date. NB: This cannot
    be checked from page version #1, because that might be deleted.
    This is a DATE.

Recommendation: Use DCTERMS.created. Agreed: this must be carried
through all revisions since it provides a canonical container for the
origin date of the document. [DC.date or DC.date.created]
DATE.

  * atom:id As in RFC 4287. This has some advantages, and can easily be
    tied to the JCR jcr:uuid. This is a STRING.

Recommendation: Use DCTERMS.identifier. [DC.identifier]
    STRING (URI?)

  * wiki:creator As in atom_published, the creator probably needs to
    be stored separately. Though on wikipages it might not be that useful.
    This is TBD.

Recommendation: Use DCTERMS.creator. The Atom specification seems to borrow
extensively from DC, with atom:author identical with the concept of
DC.creator (they apparently just didn't like the term 'creator' and
changed it to 'author'), but do use 'contributor' in the same manner
(again, paraphrasing the terminology from DC). This will need to occur
in all revisions since we need to maintain the original author ID
regardless of the existence of a given revision. [DC.creator]
STRING.

  * wiki:author Denotes the Identity of the user who saved this version
    of the page. This should probably be a reference to the user identity.
    It should also have a useful value in case the modification is done
    by the system automatically. This value should never be anything
    meaningless - in fact, I think that PageManager should throw an Exception
    if there is an missing attribute when saved. This is TBD.

Recommendation: Use DCTERMS.contributor.  The idea with DC.creator and
DC.contributor is that the former is the original creator (author) of
a resource, and any subsequent contributions (editing, translation, etc.)
are considered as being done by a 'contributor'. For the original author,
see wiki:creator (DC.creator) above. [DC.contributor]

    STRING.

  * wiki:ipaddr The IP address where the last change occurred. The
    SpamFilter might then add some additional tags (in its own namespace).
    This is a STRING

Recommendation: Use new application profile wiki:ipaddr. STRING.

  * wiki:content The actual content as a binary stream (BINARY)

Recommendation: Use new application profile wiki:content. Question
as to binary stream? Not STRING?

  * wiki:contentType The MIME type of the content. JSPWiki markup shall be
    denoted as "text/x-wiki.jspwiki". Creole as "text/x-wiki.creole".
    Other types are also allowed, e.g. "text/html" or "image/jpeg".

Recommendation: Use DCTERMS.format. This is the term used to contain
a format identifier.  While I recognise that these discussions tend to
devolve rather quickly, I would highly recommend considering the MIME
or Internet Media Type as "application/*" instead of "text/*", e.g.,
"application/x-wiki+jspwiki". In looking at the history of "text/html"
vs. "application/html" this would suggest that text formats that use
a significant amount of processing to perform rendering generally move
towards being considered more an application than a text format (i.e.,
that while they may be largely human readable they quickly become
indecipherable or largely unreadable in practice when used with plugins
and other complex syntax, e.g., many if not most pages on Wikipedia.
[DC.format] STRING.

  * wiki:acl The access control list for this page. Format TBD.

Recommendation: Use new application profile wiki:acl. TBD.

  * wiki:changenote A simple, text/plain description of the note of the
    change. STRING.

Recommendation: Use new application profile wiki:changenote. The way
to do this in Dublin Core would likely be considered too complicated
for this application. The change note needs to be considered as
metadata of the revision, not the document.
STRING.

  * wiki:state Essentially an Enum defining the state of the page. Can
    be EXISTS or DELETED. Format TBD.

Recommended: Use new application profile wiki:state. Enumerated value
set. STRING.

  * wiki:minorchange This change is minor, and should not be shown in
    the changelog, though an actual change has been made.

Recommended: Use new application profile wiki:state. Enumerated value
set. Not labeled as BOOLEAN but seems to be.

----

In summary, while I see Atom as interesting and in large part semantically
compatible with Dublin Core, I think it'd be better to incorporate a
schema that was designed more specifically for resources than for feeds;
the definitions fit more closely with our usage.

As I mentioned above I consider these merely comments-on-the-path.

Murray


DCTERMS. The set of Dublin Core terms are found at
    http://dublincore.org/documents/dcmi-terms/
...........................................................................
Murray Altheim <murray07 at altheim.com>                           ===  = =
http://www.altheim.com/murray/                                     = =  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk               = =  = =

       Boundless wind and moon - the eye within eyes,
       Inexhaustible heaven and earth - the light beyond light,
       The willow dark, the flower bright - ten thousand houses,
       Knock at any door - there's one who will respond.
                                       -- The Blue Cliff Record





Mime
View raw message