any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hans Brende (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ANY23-326) parsing unclosed meta and input tags fails
Date Wed, 24 Jan 2018 09:44:00 GMT

    [ https://issues.apache.org/jira/browse/ANY23-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337280#comment-16337280
] 

Hans Brende commented on ANY23-326:
-----------------------------------

Just for reference, here is the actual stack trace:

org.eclipse.rdf4j.rio.RDFParseException: org.xml.sax.SAXParseException; lineNumber: 170; columnNumber:
3; The element type "input" must be terminated by the matching end-tag "</input>".
 at org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:111)
 at org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:95)
 at org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:117)
 at org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:47)
 at org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:473)
 at org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:261)
 at org.apache.any23.Any23.extract(Any23.java:300)
 at org.apache.any23.Any23.extract(Any23.java:452)
 at org.apache.any23.cli.Rover.performExtraction(Rover.java:182)
...
Caused by: org.semarglproject.rdf.ParseException: org.xml.sax.SAXParseException; lineNumber:
170; columnNumber: 3; The element type "input" must be terminated by the matching end-tag
"</input>".
 at org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1141)
 at org.semarglproject.source.XmlSource.process(XmlSource.java:50)
 at org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87)
 at org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167)
 at org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154)
 at org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:109)
 ... 10 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 170; columnNumber: 3; The element type
"input" must be terminated by the matching end-tag "</input>".
 at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
 at org.semarglproject.source.XmlSource.process(XmlSource.java:48)
 ... 14 more

> parsing unclosed meta and input tags fails
> ------------------------------------------
>
>                 Key: ANY23-326
>                 URL: https://issues.apache.org/jira/browse/ANY23-326
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 2.1
>         Environment: ubuntu 17.04
>            Reporter: Ben Roberts
>            Priority: Major
>             Fix For: 2.2
>
>
> parsing fails as soon as it hits an unclosed input or meta tag, as an example try
>  ./bin/any23 rover https://ben.thatmustbe.me/note/2017/12/28/1
> [Fatal Error] :170:3: The element type "input" must be terminated by the matching end-tag
"</input>".
>  
> It seems like the issue might be that this is using a very old version of jsoup.  at
least as best I could tell.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message