any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lev Khomich (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ANY23-137) RDFa parser implementation proposal
Date Mon, 03 Mar 2014 12:31:21 GMT

    [ https://issues.apache.org/jira/browse/ANY23-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918011#comment-13918011
] 

Lev Khomich edited comment on ANY23-137 at 3/3/14 12:29 PM:
------------------------------------------------------------

Thanks, Stephane!

Completely missed that RDFa was used as a part of extraction process in other tests.
I've added related fixes. 

Brief description.

*ServletTest*
Old RDFa implementation produces
{{<issue level="Warning" row="14" col="5">Error while processing node /HTML(1)/HEAD(1)/META(9)
: 'Cannot map prefix 'fb''</issue>}}
while {{<fb:app_id>}} is completely valid predicate which shouldn't be resolved against
fb: prefix.

*Any23Test*
*RoverTest*
Changed RDFXMLWriter to NTriplesWriter in some tests to improve precision (they basically
check line count).
Changed expected triples count. It was reduced in most cases, because old RDFa parsed produced
a lot of invalid triples like:

{quote}
<http://host.com/service> <http://host.com/serviceexternal> <http://host.com/service/ambiente/>
.
<http://host.com/service> <http://host.com/serviceexternal> <http://host.com/service/salute/>
.
<http://host.com/service> <http://host.com/serviceexternal> <http://host.com/service/legalita/>
.
<http://host.com/service> <http://host.com/serviceexternal> <http://www.ansamed.info/>
.
<http://host.com/service> <http://host.com/serviceexternal> <http://host.com/service/web/notizie/regioni/lazio/provinciadiroma/>
.
{quote}

Fixed markup in {{test-resources/src/test/resources/html/rdfa/ansa_2010-02-26_12645863.html}}
to conform declared XHTML 1.0 Strict.
Fixed RDFa markup in {{test-resources/src/test/resources/html/encoding-test.html}} otherwise
it shouldn't produce any triples.
Disabled second part of {{Any23Test.testExtractionParameters}}. Should it do anything after
RDFa parser replacement?

Also, ExtractionException thrown from BaseRDFExtractor is escalated in test suite. It leads
to some failed tests in Any23Test. What's the correct behaviour for ANY23 parser in case it
gets SAXException?



was (Author: levkhomich):
Completely missed that RDFa was used as a part of extraction process in other tests.
I've added related fixes. 

Brief description.

*ServletTest*
Old RDFa implementation produces
{{<issue level="Warning" row="14" col="5">Error while processing node /HTML(1)/HEAD(1)/META(9)
: 'Cannot map prefix 'fb''</issue>}}
while {{<fb:app_id>}} is completely valid predicate which shouldn't be resolved against
fb: prefix.

*Any23Test*
*RoverTest*
Changed RDFXMLWriter to NTriplesWriter in some tests to improve precision (they basically
check line count).
Changed expected triples count. It was reduced in most cases, because old RDFa parsed produced
a lot of invalid triples like:

{quote}
<http://host.com/service> <http://host.com/serviceexternal> <http://host.com/service/ambiente/>
.
<http://host.com/service> <http://host.com/serviceexternal> <http://host.com/service/salute/>
.
<http://host.com/service> <http://host.com/serviceexternal> <http://host.com/service/legalita/>
.
<http://host.com/service> <http://host.com/serviceexternal> <http://www.ansamed.info/>
.
<http://host.com/service> <http://host.com/serviceexternal> <http://host.com/service/web/notizie/regioni/lazio/provinciadiroma/>
.
{quote}

Fixed markup in {{test-resources/src/test/resources/html/rdfa/ansa_2010-02-26_12645863.html}}
to conform declared XHTML 1.0 Strict.
Fixed RDFa markup in {{test-resources/src/test/resources/html/encoding-test.html}} otherwise
it shouldn't produce any triples.
Disabled second part of {{Any23Test.testExtractionParameters}}. Should it do anything after
RDFa parser replacement?

Also, ExtractionException thrown from BaseRDFExtractor is escalated in test suite. It leads
to some failed tests in Any23Test. What's the correct behaviour for ANY23 parser in case it
gets SAXException?


> RDFa parser implementation proposal
> -----------------------------------
>
>                 Key: ANY23-137
>                 URL: https://issues.apache.org/jira/browse/ANY23-137
>             Project: Apache Any23
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.8.0
>            Reporter: Lev Khomich
>            Assignee: Peter Ansell
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: oQYfomKX.part, rdfa-extractor-proposal.patch
>
>
> As a follow up to discussion [1].
> I've implemented another RDFa extractor for Any23 (0.7.1).
> Proposed code depends on semargl project [2]. It isn't published in maven
> central, therefore I didn't change any poms.
> Still not quite sure about class name (because related ones are already taken),
> feel free to rename it. See attachments for patch with extractor and tests.
> [1] http://mail-archives.apache.org/mod_mbox/any23-dev/201212.mbox/browser
> [2] http://semarglproject.org



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message