incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Companjen (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ANY23-65) Update to RDFa extraction stylesheet
Date Sat, 24 Mar 2012 12:50:25 GMT

    [ https://issues.apache.org/jira/browse/ANY23-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237505#comment-13237505
] 

Ben Companjen commented on ANY23-65:
------------------------------------

One step further: build fails with my updated stylesheet because the tests try to extract
triples from rdfa-11-curies.html, which has prefix="db:http://database.org/ dc:http://purl.org/dc/01/"
(without spaces). Even though this doesn't follow the spec, I start to believe it is important
to support both namespace definitions with space and without space between prefix and namespace.
Perhaps an extra template "tokenize2a" that can be called when not(contains(@prefix,': '))
is all it takes. (Just thinking out loud here.)
                
> Update to RDFa extraction stylesheet
> ------------------------------------
>
>                 Key: ANY23-65
>                 URL: https://issues.apache.org/jira/browse/ANY23-65
>             Project: Apache Any23
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Ben Companjen
>              Labels: patch, xslt
>         Attachments: rdfa.xslt, stylesheet.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> The RDFa 1.1 Core specification requests namespace prefixes in HTML5 be put in a "prefix"
attribute like this: "ns1: http://example.org/ ns2: http://example.com/"
> My sample HTML page has this, but Sindice, which uses Any23, didn't read my namespace
correctly. I narrowed it down to, and changed accordingly, the XSLT template "tokenize2" in
the rdfa.xslt stylesheet. The template expected "ns1:http://example.org/ ns2:http://example.com/"
(no spaces between prefix and namespace URI) and did not normalize whitespace, like linebreaks
(although I'm not sure that broke the functionality).
> I use Any23 0.6.1 locally, but http://svn.apache.org/viewvc/incubator/any23/trunk/core/src/main/resources/org/apache/any23/extractor/rdfa/rdfa.xslt?revision=1231556&view=markup
shows that the template is the same in the trunk.
> A possible problem may be that the new template will not accept the non-spaced namespace
definitions, like you can find in the RDFa produced by Best Buy. A further improvement to
my template may be accepting both namespace definitions with spaces and the ones without.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message