incubator-odf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svante Schubert <svante.schub...@gmail.com>
Subject Re: [GSoC 2012] Add ODF 1.2 RDF metadata support to ODF Toolkit (Tao Lin)
Date Fri, 23 Mar 2012 00:15:14 GMT
Dear Tao Lin,
<http://docs.oracle.com/javase/1.5.0/docs/api/org/xml/sax/ErrorHandler.html>
On 22.03.2012 09:59, Tao Lin wrote:
> Dear Svante,
>
> The answers and the lasted example document [4b] help me a lot. I'm
> now clear about the previous question's. I can understand the RDF
> metadata sections in ODF spec 1.2. Thank you very much!
>
> These days, I have been studying the online documents of ODF Toolkit.
> Now, I know the ODFDOM layers and the Simple API. I've also checked
> out the source code of ODF Toolkit from svn trunk. I'm thinking about
> where the code of the RDF metadata support should reside in, e.g. in
> which layer? high level layer or low level one? Here're some questions
> that I'd like to turn to you for help:
>
> (1) For high level code:
> I find many of the classes in the package of
> "org.odftoolkit.odfdom.doc" have been marked as "deprecated". Will the
> whole ODF Document layer be illuminated from the architecture and
> finally replaced by the Simple API in future releases? If so, should
> we avoid putting the high level code of the RDF metadata support into
> the ODF Document layer? Is the Simple API the right place for high
> level code in this GSoC project?
You are correct. The high level DOC API will be replaced by the high
level Simple API, which is the right place for the high level metadata
access.
>
> (2) For low level code:
> ODFDOM contains a low level ODF XML Layer with the ODF DOM API. I'm
> not sure whether the RDF metadata support should do with this XML
> layer, and how. As a veteran programmer of RDF, I think the most
> convenient way to precess RDF data is to use the specified tools, like
> Jena and Sesame. For example, Jena provides direct API for parsing RDF
> files, adding and removing RDF triples, whose users are not aware of
> the underlining XML related processing work. The mechanism of the code
> generation from ODF schema of ODF XML Layer may not apply to RDF
> metadata support. What do you think?
I agree with you, the handling of RDF should be part of a specialized
RDF lib as Jena or Sesame.
The gathering of RDF files from the package and the collection of RDFa
triple as well as text that has become the RDF object of a triple have
to be extracted by the toolkit.

Did you ever heard about GRDDL? Basically it is only a mighty acronym
for a very simple technique to extract the RDF graph from an XML file
based on a XSLT stylesheet.
I have once started such a GRDDL XSLT stylesheet for ODF and it can be
found at the OASIS ODF TC
http://www.oasis-open.org/committees/document.php?document_id=30609&wg_abbrev=office-metadata
<http://docs.oracle.com/javase/1.5.0/docs/api/org/xml/sax/ErrorHandler.html>
Although the XSLT is unfinished, as some missing features reported have
not yet been implemented, it can be seen as a different way to test the
complete extraction of RDF from the package, which I do see as main
scenario.

NOTE: I mapped as well very straight-forward the complete meta.xml to
RDF, as this ODF XML metadata should be part of the RDF graph as well.
>
> (3) Based on the above the two thoughts, I recline to make the RDF
> metadata support mostly in the high level, or more explicitly in the
> Simple API. It means that I'd like to design and develop the RDF
> metadata support as part of the Simple API. For example, I may enrich
> the API of "org.odftoolkit.simple.Document" with some methods like
> "addRDFMetadataForElement(elementPath, predicate, objectValue)". As
> another example, I should also design something like
> "org.odftoolkit.simple.rdf.RDFMeta" representing "manifest.rdf" with
> API adding/removing the MetadataFile information. What's your opinion?
Yes, the high level API is the convenient layer for the user, which
abstracts him from the nasty implementation details.
It should indeed be as simple as possible, while the lower layer handles
all the XML / RDF implementation work.

The detailed design & naming can be discussed a little later, similar to
your preference to a Java RDF library and their pro/cons?

PS: I will be traveling from tomorrow midday to Tuesday evening,
therefore I might not be able have Internet and answer in that time period!

Best regards,
Svante
>
> Best regards,
> Tao Lin
>
> On Sun, Mar 18, 2012 at 11:15 PM, Svante Schubert
> <svante.schubert@gmail.com> wrote:
>> Hello Tao Lin,
>>
>> Very pleased to meet you, you made an impressive research and raised good
>> questions.
>>
>> Please find my answers below..
>>
>>
>> On 18.03.2012 12:02, Tao Lin wrote:
>>
>> Dear Sir/Madam,
>>
>> My name is Tao Lin, a third year undergraduate student from China. I'm
>> very interested in GSoC 2012 project: Add ODF 1.2 RDF metadata support
>> to ODF Toolkit. I have good knowledge of semantic technologies, such
>> as RDF, OWL, SPARQL. I'm also familiar with the mainstream Java based
>> RDF/OWL processing tools like Jena, Sesame, AllegroGraph. I have
>> strong Java coding skills with of good knowledge of the software
>> design patterns. Last year, I was accepted by GSoC 2011 and
>> successfully completed a project for LanguageTool [6]. This summer,
>> I'd like to contribute to ODF community in this "RDF metadata support"
>> project, because I find my abilities match the project requirements
>> very well.
>>
>> I just studied the provided documents [1] [2], and the OWL file [5]. I
>> also found some slides [3] and a document [4] demostrating some
>> examples. However, not all of the documents are up-to-date: [4] is
>> composed in 2007, and [3] is published last year. I can understand
>> most of the specification, but I''m quite confused with some parts
>> because of the inconsistency among the documents. Could you help me
>> with the following questions?
>>
>> 2007 the specification was still in change (or under construction),
>> therefore the differences and confusion - I should have stated it more
>> obviously in my presentation, you referenced as [3].
>> There was a key event happing later in October 2008, when I gave a
>> presentation to the W3C Semantic Interest Group at their TPAC -
>> http://www.w3.org/2008/10/TPAC/ to review the metadata work.
>> There was a major change afterwards, earlier there had been a mapping in the
>> manifest.rdf between an ODF content & ID and an URN being assigned to it in
>> the manifest.rdf for the RDF graph.
>> This was initiated by some RDFa expert within the OASIS sub-committee
>> stating that identification (identifying) would be not similar than
>> localization (finding).
>> The W3C group, especially Sir Tim Berners Lee, gave me feed-back that this
>> is wrong. That URN would have been an ill invention and that identification
>> & localization should be used as the same, otherwise the Internet would not
>> have worked. Since, than we directly refer with relative URLs from the
>> manifest.rdf to metadata in the content/package. Sorry, for the confusion.
>>
>>
>>
>> (1). As is showed in [2], RDF Metadata are of two types:
>> 4.2.1 In Content Metadata (RDFa)
>> 4.2.2 manifest.rdf
>> Are both of them within the scope of this GSoC project? Or just the second
>> one?
>>
>> Both (or precisely all possible metadata), but there should be a generic
>> handling possible. For instance, RDFa would be accessed via ODF Toolkit API.
>> Likely to be added to ODFDOM, perhaps accessed even by generated
>> functionality (more about generation later).
>>
>>
>> (2) In page 12 of [3], is the old OWL Class"pkg:Package" replaced by
>> "pkg:Document"? I can not find "pkg:Package" in [1], [2] or [5].
>>
>> The spec - as usually - is correct. The owl class pkg:Package was dropped
>> for now to avoid boilerplate.
>> As some basic information on ODF 1.2 specification:
>> Part 3 of the spec is the package format and might be reused by other
>> formats (unfortunately EPUB missed to reuse the package standard in their
>> latest spec and reinvented the wheel).
>> Part 1 of the spec is the ODF XML format and is based on part 3 (yes - the
>> numbering is confusing sometes, I usually would expect "1" to be the ground
>> layer, but the number was chosen obviously by a different criteria and is
>> finally not important).
>>
>> From the package view a document is nothing more than a directory with a
>> mime type in the package - identified by information in the
>> /META-INF/manifest.xml file.
>> There is always a root document in an ODF package, but there might be sub
>> documents as well - like embedding a chart document within a spreadsheet,
>> see
>> http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part3.html#General
>>
>> Therefore to answer a later asked question the difference between the pkg:
>> and odf: prefix is the layer. A pkg:file can be any arbitrary file within an
>> ODF package, while odf:file can be only a file defined in part 1 (e.g.
>> content.xml, stylesl.xml, etc.). BTW you forgot one reference to an OWL (the
>> one of part 3), I have added it as [5a]
>>
>>
>> (3) In page 15 of [3], it uses "pkg:idref". But in page 7 of [4], it
>> shows "odf:idref". I can not find the definition in [1], [2], or [5].
>> Which one is correct?
>>
>> Neither, it was one thing that was corrected by the W3C comment (see intro
>> text in the beginning)
>>
>>
>> (4) For "In Content Metadata", besides the supported 5 elements showed
>> in page 16 of [3], the additional 6th one is
>> "<table:covered-table-cell>" according to [2]. Is that true?
>>
>> The spec is like a blue-print and outrules everything. Especially as the
>> spec is from 2011 it outrules my presentation from 2007.
>> Nevertheless I am happy if you are question the spec as well, because even
>> the spec is created by humans and errors are still possible.
>>
>> ODF Toolkit allows the generation of sources by relying on the (Sun)
>> Multi-Schema Validator to parse the XML and the Apache Velocity template
>> engine to have text templates that allow access to a Java context, see
>> http://svn.apache.org/viewvc/incubator/odf/trunk/generator/
>> either generate the latest JavaDoc, or download an old one
>> https://oss.sonatype.org/content/groups/public/org/odftoolkit/schema2template/0.8.7/schema2template-0.8.7-javadoc.jar
>> and call "jar -xvf schema2template-0.8.7-javadoc.jar" and the index.html
>> will provide you a good detailed overview over the generator.
>>
>>
>> (5) As is showed in page 17 , for "In Content Metadata", we don't use
>> manifest.rdf to map the "xml:id" to RDF IRI, do we? I think, there are
>> no "In Content Metadata" information in manifest.rdf. Is that true?
>>
>> Manifest.rdf is an RDF file as the suffix indicates. Its reason is to have a
>> single point of information about metadata on the ODF package as described
>> in
>> http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part3.html#Metadata
>> There is no RDFa in the manifest.rdf, if this was your question.
>>
>>
>>
>> (6) In page 18 of [3], we have <text:meta-field>. What're the
>> differences of <text-:meta-field> and <text:meta> (in page 16 of [3])?
>> Are they visible or invisible to users?
>>
>> Both include text with metadata.
>> text:meta is similar to a text:span with metadata.
>> text:meta-field is like text that was generated by metadata. Think of
>> citations that are being generated in a certain way by your metadata based
>> citation plugin. For instance, is regenerated whenever you choose a new
>> citation layout required by a different magazine you like to sent the
>> document to, see
>> http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#element-text_meta-field
>>
>>
>>
>> (7) How to use "odf:prefix" and "odf:suffix" for <text:meta-field>?
>> Can you show me some examples?
>>
>> Added the latest example document to the reference list as [4b] -
>> http://www.oasis-open.org/committees/document.php?document_id=34796&wg_abbrev=office-metadata
>> Unfortunately there is no example about it, anyway it was required to define
>> the pre- and suffix of a field that was being generated. Use cases state
>> there is very often such pre- and suffix in a field.
>> Never mind, it is not upto you to create a text-field functionality as a
>> citation application. You only have to add the access (read, write,
>> deletion) of these field in the ODF Toolkit.
>>
>>
>> (8) In [5], we have "odf:Element" and "pkg:Element". What're the
>> differences? I'm also confused about the namespaces of "odf" and
>> "pkg". Sometime we use "odf" (e.g. odf:ContentFile), while others are
>> "pkg" (e.g. pkg:MetadataFile). Why?
>>
>> It is because of modularity. As described already at #2, pkg is for every
>> application that is reusing the ODF package, like pkg:Element is an XML
>> element within an XML file in an ODF package (spec part 3).
>> odf:Element instead is as well used within an ODF package, but it also uses
>> ODF XML (spec part 1).
>>
>> Looking forward to hearing from you!
>>
>> [1]
>> http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part3.html#Metadata_Manifest_Files
>> [2]
>> http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#a4Metadata
>> [3] http://www.slideshare.net/jza/the-openofficeorg-odf-toolkit-project
>> [4]
>> http://www.oasis-open.org/committees/download.php/25054/07-08-22-MetaData-Examples.odt
>>
>> [4b]
>> http://www.oasis-open.org/committees/document.php?document_id=34796&wg_abbrev=office-metadata
>> [5a]
>> http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-package-metadata.owl
>>
>> [5]
>> http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-metadata.owl
>> [6] http://www.languagetool.org/gsoc2011/
>>
>> Yours faithfully,
>> Tao Lin
>>
>> If there is any further question or I did not explain something clear
>> enough, do not hesitate to ask again.
>>
>> Best regards,
>> Svante


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message