any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ANY23-137) RDFa parser implementation proposal
Date Fri, 11 Apr 2014 17:06:20 GMT

    [ https://issues.apache.org/jira/browse/ANY23-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966791#comment-13966791
] 

Lewis John McGibbney commented on ANY23-137:
--------------------------------------------

Regarding the 1st question above. It all looks good. The changes in {noformat}Any23Test.testExtractionParameters{noformet}
look only to be aesthetic reformatting as oppose to functional.

I do not think that there is any _standard_ for catching SAXException. In the past (ANY23-115)
for example when we discovered that empty spans break extraction of some documents, we decided
to simply replace empty spans with a String "null". This way entire page parse and extraction
is not lost/failed. I would be supportive of such measure if they occur when we encounter
SAXException as well.     

> RDFa parser implementation proposal
> -----------------------------------
>
>                 Key: ANY23-137
>                 URL: https://issues.apache.org/jira/browse/ANY23-137
>             Project: Apache Any23
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.8.0
>            Reporter: Lev Khomich
>            Assignee: Peter Ansell
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: oQYfomKX.part, rdfa-extractor-proposal.patch
>
>
> As a follow up to discussion [1].
> I've implemented another RDFa extractor for Any23 (0.7.1).
> Proposed code depends on semargl project [2].
> Pull request located at [3].
> [1] http://mail-archives.apache.org/mod_mbox/any23-dev/201212.mbox/browser
> [2] http://semarglproject.org
> [3] https://github.com/apache/any23/pull/2



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message