any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ANY23-326) parsing unclosed meta and input tags fails
Date Thu, 25 Jan 2018 05:13:00 GMT

    [ https://issues.apache.org/jira/browse/ANY23-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338740#comment-16338740
] 

Hudson commented on ANY23-326:
------------------------------

SUCCESS: Integrated in Jenkins build Any23-trunk #1526 (See [https://builds.apache.org/job/Any23-trunk/1526/])
ANY23-326 fixed rdfa issue with unclosed input & meta tags (Hans: rev eefa208db3b4ad176ab3636fb3cc539bc00ea100)
* (edit) api/src/main/resources/default-configuration.properties
* (edit) core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java
* (edit) core/src/main/java/org/apache/any23/extractor/html/TagSoupParser.java
* (edit) core/src/main/java/org/apache/any23/extractor/html/TagSoupParsingConfiguration.java
* (add) core/src/main/java/org/apache/any23/extractor/html/JsoupUtils.java


> parsing unclosed meta and input tags fails
> ------------------------------------------
>
>                 Key: ANY23-326
>                 URL: https://issues.apache.org/jira/browse/ANY23-326
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 2.1
>         Environment: ubuntu 17.04
>            Reporter: Ben Roberts
>            Priority: Major
>             Fix For: 2.2
>
>
> parsing fails as soon as it hits an unclosed input or meta tag, as an example try
>  ./bin/any23 rover https://ben.thatmustbe.me/note/2017/12/28/1
> [Fatal Error] :170:3: The element type "input" must be terminated by the matching end-tag
"</input>".
>  
> It seems like the issue might be that this is using a very old version of jsoup.  at
least as best I could tell.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message