any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ANY23-314) Service fails to return extraction in case of extraction error
Date Tue, 19 Dec 2017 12:47:00 GMT

    [ https://issues.apache.org/jira/browse/ANY23-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296739#comment-16296739
] 

ASF GitHub Bot commented on ANY23-314:
--------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/any23/pull/49


> Service fails to return extraction in case of extraction error
> --------------------------------------------------------------
>
>                 Key: ANY23-314
>                 URL: https://issues.apache.org/jira/browse/ANY23-314
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: service
>    Affects Versions: 2.1
>         Environment: Any23 2.2-SNAPSHOT
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>             Fix For: 2.2
>
>         Attachments: extraction.json, output.log
>
>
> See the following command line extraction
> {code}
> lmcgibbn@LMC-056430 /usr/local/any23(master) $ ./cli/target/appassembler/bin/any23 rover
-l output.log -o extraction.json https://www.jobcluster.de
> ------------------------------------------------------------------------
> Apache Any23 :: rover
> ------------------------------------------------------------------------
> 0    [main] WARN  org.apache.tika.parser.image.ImageParser  - JBIG2ImageReader not loaded.
jbig2 files will be ignored
> 128  [main] INFO  org.apache.any23.rdf.PopularPrefixes  - Loading prefixes from /org/apache/any23/prefixes/prefixes.properties
> 1388 [main] WARN  org.apache.commons.httpclient.HttpMethodBase  - Going to buffer response
body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
> 4790 [main] INFO  org.apache.any23.extractor.SingleDocumentExtraction  - Processing https://www.jobcluster.de/
> [Fatal Error] :12:46: The entity name must immediately follow the '&' in the entity
reference.
> ------------------------------------------------------------------------
> Apache Any23 FAILURE
> Execution terminated with errors: Error while parsing RDF document.
> Total time: 5s
> Finished at: Tue Dec 12 08:01:14 PST 2017
> Final Memory: 31M/184M
> ------------------------------------------------------------------------
> {code}
> This results in the attached extraction result (extraction.json) and associated log (output.log)
> If I attempt to run the same extraction using the service at [any23.org|http://any23.org/any23/?format=json&uri=https%3A%2F%2Fwww.jobcluster.de%2F&validation-mode=none]
the (partial) extraction result should be returned regardless of whether the entire extraction
was successful or not.
> The service servlet seems to be returning the extraction Exception as oppose to the preferred
extraction result. This issue will fix that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message