incubator-odf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tao Lin <>
Subject Re: [GSoC 2012] Add ODF 1.2 RDF metadata support to ODF Toolkit (Tao Lin)
Date Sun, 25 Mar 2012 05:26:35 GMT
Dear Svante,

Thanks for your reply! I have coding experience of GRDDL RDF extraction
from XML using Jena GRDDL Reader [1]. We can also easily use Jena GRDDL
Reader for RDFa only transformation for In Content Metadata of ODF [2] , by
setting the Reader property value of "grddl.rdfa" to "true" [3]. Besides
Jena GRDDL Reader, we have other choices of Java based RDFa extraction
tools, such as java-rdfa [4], java-core-rdfa [5]. Both of them support Jena
integration. Additionally, we may use OWL [6] in this project, maybe the
function of processing OWL is required (Is it?).

According to the above function requirements, the pros and cons of
mainstream Java RDF libraries for this project are summarised as follows
(Note that, the tools with bold style are those I have direct coding
experience of):
- *Jena*: the best candidate, famous and widely used for RDF and OWL, with
RDFa, GRDDL support.
- *Sesame* [7]: a good candidate, famous and widely used for RDF, good RDF
API, but less powerful in RDFa, GRDDL and OWL support.
- *Mulgara* [8]: RDF triplestore for semantic data persistence, with Jena
Interface [9], less powerful in RDFa, GRDDL and OWL support.
- *AllegroGraph* [10]: RDF triplestore for semantic data persistence, use
Sesame RDF API [12], with integration API for Jena [11], outstanding SPARQL
query performance, less powerful in RDFa, GRDDL and OWL support, with
licence issue (not open source licence) [13]
- JRDF [14], *Tupelo* [15]: RDF libraries, but not in-active and updated
years ago.
- OWLIM [16], OWLAPI [17], Pellet [18]: powerful in OWL API and reasoning
support, without RDFa, GRDDL supported.

In summary, I recommend Jena as the RDF library for this GSoC project.
According to the previous discussions with you, I think there are three
types of RDF data we need to consider for ODF. Here's the list of them and
corresponding technical solutions:
(1) RDF data in manifest.rdf [19]: use Jena ARP [20] for RDF/XML parsing,
and related RDF API for adding/deleting/modifying/storing RDF triples.
(2) In Content Metadata (RDFa) [2]: use Jena built in GRDDL Reader [1] for
RDF triple extraction (or switch to other Jena GRDDL Reader
implementations, such as java-rdfa [4], java-core-rdfa [5]). (Are
adding/deleting/modifying/storing function required in this part?)
(3) RDF data from ODF XML metadata (meta.xml): use Jena built-in GRDDL
Reader [1] for RDF triple extraction. (Are
adding/deleting/modifying/storing function required in this part?)

I think the project scopes are almost clear now. Next, I'd like to think
about the project plan and come with a project proposal in the up-coming
week. I have not found the project proposal template of Apache Software
Foundation GSoC 2012 programme. It's greatly appreciated if you can tell me
the supposed structure and the main content of the project proposal. Could
you please show me the template or some good proposal examples from
previously accepted GSoC student for Apache Software Foundation? I also
want to know how Apache Software Foundation ranks and selects the project
proposals. Any suggestions or tips to increase the chance of project
proposal acceptance are welcome!

Best regards,
Tao Lin


On Fri, Mar 23, 2012 at 8:15 AM, Svante Schubert <>
> Dear Tao Lin,
> On 22.03.2012 09:59, Tao Lin wrote:
> Dear Svante,
> The answers and the lasted example document [4b] help me a lot. I'm
> now clear about the previous question's. I can understand the RDF
> metadata sections in ODF spec 1.2. Thank you very much!
> These days, I have been studying the online documents of ODF Toolkit.
> Now, I know the ODFDOM layers and the Simple API. I've also checked
> out the source code of ODF Toolkit from svn trunk. I'm thinking about
> where the code of the RDF metadata support should reside in, e.g. in
> which layer? high level layer or low level one? Here're some questions
> that I'd like to turn to you for help:
> (1) For high level code:
> I find many of the classes in the package of
> "org.odftoolkit.odfdom.doc" have been marked as "deprecated". Will the
> whole ODF Document layer be illuminated from the architecture and
> finally replaced by the Simple API in future releases? If so, should
> we avoid putting the high level code of the RDF metadata support into
> the ODF Document layer? Is the Simple API the right place for high
> level code in this GSoC project?
> You are correct. The high level DOC API will be replaced by the high level
> Simple API, which is the right place for the high level metadata access.
> (2) For low level code:
> ODFDOM contains a low level ODF XML Layer with the ODF DOM API. I'm
> not sure whether the RDF metadata support should do with this XML
> layer, and how. As a veteran programmer of RDF, I think the most
> convenient way to precess RDF data is to use the specified tools, like
> Jena and Sesame. For example, Jena provides direct API for parsing RDF
> files, adding and removing RDF triples, whose users are not aware of
> the underlining XML related processing work. The mechanism of the code
> generation from ODF schema of ODF XML Layer may not apply to RDF
> metadata support. What do you think?
> I agree with you, the handling of RDF should be part of a specialized RDF
> lib as Jena or Sesame.
> The gathering of RDF files from the package and the collection of RDFa
> triple as well as text that has become the RDF object of a triple have to
> extracted by the toolkit.
> Did you ever heard about GRDDL? Basically it is only a mighty acronym for
> very simple technique to extract the RDF graph from an XML file based on a
> XSLT stylesheet.
> I have once started such a GRDDL XSLT stylesheet for ODF and it can be
> at the OASIS ODF TC
> Although the XSLT is unfinished, as some missing features reported have
> yet been implemented, it can be seen as a different way to test the
> extraction of RDF from the package, which I do see as main scenario.
> NOTE: I mapped as well very straight-forward the complete meta.xml to RDF,
> as this ODF XML metadata should be part of the RDF graph as well.
> (3) Based on the above the two thoughts, I recline to make the RDF
> metadata support mostly in the high level, or more explicitly in the
> Simple API. It means that I'd like to design and develop the RDF
> metadata support as part of the Simple API. For example, I may enrich
> the API of "org.odftoolkit.simple.Document" with some methods like
> "addRDFMetadataForElement(elementPath, predicate, objectValue)". As
> another example, I should also design something like
> "org.odftoolkit.simple.rdf.RDFMeta" representing "manifest.rdf" with
> API adding/removing the MetadataFile information. What's your opinion?
> Yes, the high level API is the convenient layer for the user, which
> abstracts him from the nasty implementation details.
> It should indeed be as simple as possible, while the lower layer handles
> the XML / RDF implementation work.
> The detailed design & naming can be discussed a little later, similar to
> your preference to a Java RDF library and their pro/cons?
> PS: I will be traveling from tomorrow midday to Tuesday evening,
therefore I
> might not be able have Internet and answer in that time period!
> Best regards,
> Svante
> Best regards,
> Tao Lin
> On Sun, Mar 18, 2012 at 11:15 PM, Svante Schubert
> <> wrote:
> Hello Tao Lin,
> Very pleased to meet you, you made an impressive research and raised good
> questions.
> Please find my answers below..
> On 18.03.2012 12:02, Tao Lin wrote:
> Dear Sir/Madam,
> My name is Tao Lin, a third year undergraduate student from China. I'm
> very interested in GSoC 2012 project: Add ODF 1.2 RDF metadata support
> to ODF Toolkit. I have good knowledge of semantic technologies, such
> as RDF, OWL, SPARQL. I'm also familiar with the mainstream Java based
> RDF/OWL processing tools like Jena, Sesame, AllegroGraph. I have
> strong Java coding skills with of good knowledge of the software
> design patterns. Last year, I was accepted by GSoC 2011 and
> successfully completed a project for LanguageTool [6]. This summer,
> I'd like to contribute to ODF community in this "RDF metadata support"
> project, because I find my abilities match the project requirements
> very well.
> I just studied the provided documents [1] [2], and the OWL file [5]. I
> also found some slides [3] and a document [4] demostrating some
> examples. However, not all of the documents are up-to-date: [4] is
> composed in 2007, and [3] is published last year. I can understand
> most of the specification, but I''m quite confused with some parts
> because of the inconsistency among the documents. Could you help me
> with the following questions?
> 2007 the specification was still in change (or under construction),
> therefore the differences and confusion - I should have stated it more
> obviously in my presentation, you referenced as [3].
> There was a key event happing later in October 2008, when I gave a
> presentation to the W3C Semantic Interest Group at their TPAC -
> to review the metadata work.
> There was a major change afterwards, earlier there had been a mapping in
> manifest.rdf between an ODF content & ID and an URN being assigned to it
> the manifest.rdf for the RDF graph.
> This was initiated by some RDFa expert within the OASIS sub-committee
> stating that identification (identifying) would be not similar than
> localization (finding).
> The W3C group, especially Sir Tim Berners Lee, gave me feed-back that this
> is wrong. That URN would have been an ill invention and that
> & localization should be used as the same, otherwise the Internet would
> have worked. Since, than we directly refer with relative URLs from the
> manifest.rdf to metadata in the content/package. Sorry, for the confusion.
> (1). As is showed in [2], RDF Metadata are of two types:
> 4.2.1 In Content Metadata (RDFa)
> 4.2.2 manifest.rdf
> Are both of them within the scope of this GSoC project? Or just the second
> one?
> Both (or precisely all possible metadata), but there should be a generic
> handling possible. For instance, RDFa would be accessed via ODF Toolkit
> Likely to be added to ODFDOM, perhaps accessed even by generated
> functionality (more about generation later).
> (2) In page 12 of [3], is the old OWL Class"pkg:Package" replaced by
> "pkg:Document"? I can not find "pkg:Package" in [1], [2] or [5].
> The spec - as usually - is correct. The owl class pkg:Package was dropped
> for now to avoid boilerplate.
> As some basic information on ODF 1.2 specification:
> Part 3 of the spec is the package format and might be reused by other
> formats (unfortunately EPUB missed to reuse the package standard in their
> latest spec and reinvented the wheel).
> Part 1 of the spec is the ODF XML format and is based on part 3 (yes - the
> numbering is confusing sometes, I usually would expect "1" to be the
> layer, but the number was chosen obviously by a different criteria and is
> finally not important).
> From the package view a document is nothing more than a directory with a
> mime type in the package - identified by information in the
> /META-INF/manifest.xml file.
> There is always a root document in an ODF package, but there might be sub
> documents as well - like embedding a chart document within a spreadsheet,
> see
> Therefore to answer a later asked question the difference between the pkg:
> and odf: prefix is the layer. A pkg:file can be any arbitrary file within
> ODF package, while odf:file can be only a file defined in part 1 (e.g.
> content.xml, stylesl.xml, etc.). BTW you forgot one reference to an OWL
> one of part 3), I have added it as [5a]
> (3) In page 15 of [3], it uses "pkg:idref". But in page 7 of [4], it
> shows "odf:idref". I can not find the definition in [1], [2], or [5].
> Which one is correct?
> Neither, it was one thing that was corrected by the W3C comment (see intro
> text in the beginning)
> (4) For "In Content Metadata", besides the supported 5 elements showed
> in page 16 of [3], the additional 6th one is
> "<table:covered-table-cell>" according to [2]. Is that true?
> The spec is like a blue-print and outrules everything. Especially as the
> spec is from 2011 it outrules my presentation from 2007.
> Nevertheless I am happy if you are question the spec as well, because even
> the spec is created by humans and errors are still possible.
> ODF Toolkit allows the generation of sources by relying on the (Sun)
> Multi-Schema Validator to parse the XML and the Apache Velocity template
> engine to have text templates that allow access to a Java context, see
> either generate the latest JavaDoc, or download an old one
> and call "jar -xvf schema2template-0.8.7-javadoc.jar" and the index.html
> will provide you a good detailed overview over the generator.
> (5) As is showed in page 17 , for "In Content Metadata", we don't use
> manifest.rdf to map the "xml:id" to RDF IRI, do we? I think, there are
> no "In Content Metadata" information in manifest.rdf. Is that true?
> Manifest.rdf is an RDF file as the suffix indicates. Its reason is to
have a
> single point of information about metadata on the ODF package as described
> in
> There is no RDFa in the manifest.rdf, if this was your question.
> (6) In page 18 of [3], we have <text:meta-field>. What're the
> differences of <text-:meta-field> and <text:meta> (in page 16 of [3])?
> Are they visible or invisible to users?
> Both include text with metadata.
> text:meta is similar to a text:span with metadata.
> text:meta-field is like text that was generated by metadata. Think of
> citations that are being generated in a certain way by your metadata based
> citation plugin. For instance, is regenerated whenever you choose a new
> citation layout required by a different magazine you like to sent the
> document to, see
> (7) How to use "odf:prefix" and "odf:suffix" for <text:meta-field>?
> Can you show me some examples?
> Added the latest example document to the reference list as [4b] -
> Unfortunately there is no example about it, anyway it was required to
> the pre- and suffix of a field that was being generated. Use cases state
> there is very often such pre- and suffix in a field.
> Never mind, it is not upto you to create a text-field functionality as a
> citation application. You only have to add the access (read, write,
> deletion) of these field in the ODF Toolkit.
> (8) In [5], we have "odf:Element" and "pkg:Element". What're the
> differences? I'm also confused about the namespaces of "odf" and
> "pkg". Sometime we use "odf" (e.g. odf:ContentFile), while others are
> "pkg" (e.g. pkg:MetadataFile). Why?
> It is because of modularity. As described already at #2, pkg is for every
> application that is reusing the ODF package, like pkg:Element is an XML
> element within an XML file in an ODF package (spec part 3).
> odf:Element instead is as well used within an ODF package, but it also
> ODF XML (spec part 1).
> Looking forward to hearing from you!
> [1]
> [2]
> [3]
> [4]
> [4b]
> [5a]
> [5]
> [6]
> Yours faithfully,
> Tao Lin
> If there is any further question or I did not explain something clear
> enough, do not hesitate to ask again.
> Best regards,
> Svante

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message